What role does metaethics play in AI alignment and safety? How might paths to AI alignment change given different metaethical views? How do issues in moral epistemology, motivation, and justification affect value alignment? What might be the metaphysical status of suffering and pleasure? What’s the difference between moral realism and anti-realism and how is each view grounded? And just what does any of this really have to do with AI?
The Metaethics of Joy, Suffering, and AI Alignment is the fourth podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.
If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.
In this podcast, Lucas spoke with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, currently rebranded Humanity+. You might know him for his work on The Hedonistic Imperative, a book focusing on our moral obligation to work towards the abolition of suffering in all sentient life. Brian is a researcher at the Foundational Research Institute. He writes about ethics, animal welfare, and future scenarios on his website “Essays On Reducing Suffering.”
Topics discussed in this episode include:
- What metaethics is and how it ties into AI alignment or not
- Brian and David’s ethics and metaethics
- Moral realism vs antirealism
- Moral epistemology and motivation
- Different paths to and effects on AI alignment given different metaethics
- Moral status of hedonic tones vs preferences
- Can we make moral progress and what would this mean?
- Moving forward given moral uncertainty
Lucas: Hey, everyone. Welcome back to the AI Alignment podcast series with the Future of Life Institute. Today, we’ll be speaking with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, rebranded humanity plus, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention.
Brian Tomasik writes about ethics, animal welfare and for far-future scenarios from a suffering-focused perspective on his website reducing-suffering.org. He has also helped found the Foundational Research Institute, which is a think tank that explores crucial considerations for reducing suffering in the long term future. If you have been finding this podcast interesting or useful, remember to follow us on your preferred listening platform and share the episode on social media. Today, Brian, David, and I speak about metaethics, key concepts and ideas in the space, explore the metaethics of Brian and David, and how this all relates to and is important for AI alignment. This was a super fun and interesting episode and I hope that you find it valuable. With that, I give you Brian Tomasik and David Pearce.
Thank you so much for coming on the podcast.
David: Thank you Lucas.
Brian: Glad to be here.
Lucas: Great. We can start off with you David and then, you Brian and just giving a little bit about your background, the intellectual journey that you’ve been on and how that brought you here today.
David: Yes. My focus has always been on the problem of suffering, very ancient problem, Buddhism and countless other traditions preoccupied by the problem of suffering. I’m also a transhumanist and what transhumanism brings to the problem is suffering is the idea that it’s possible to use technology, in particular biotechnology to phase out suffering, not just in humans throughout the living world and ideally replace them by gradients of intelligent wellbeing. Transhumanism is a very broad movement embracing not just radical mood enrichment but also super longevity and super intelligence. This is what brings me in and us here today in that there is no guarantee that human preoccupations are the problems of suffering are going to overlap with those of post human super intelligence.
Lucas: Awesome, and so you, Brian.
Brian: I’ve been interested in utilitarianism since I was 18 and I discovered the word. I immediately looked it up and was interested to see that the philosophy mirrored some of the things that I had been thinking about up to that point. I became interested in animal ethics and the far future. A year after that, I actually discovered David’s writings of the Hedonistic Imperative, along with other factors. His writings helped to inspire me to care more about suffering relative to the creation of happiness. Since then, I’ve been what you might call suffering-focused, which means I think that the reduction of suffering has more moral priority than other values. I’ve written about both animal ethics including wild animal suffering as well as risks of astronomical future suffering, what are called s-risks. You had a recent podcast episode with Kaj Sotala to talk about s-risks.
I, in general think that from my perspective, one important thing to think about was during AI is what sorts of outcomes could result in large amounts of suffering? We should try to steer away from those possible future scenarios.
Lucas: Given our focuses on AI alignment, I’d like to just offer a little bit of context. Today, this episode will be focusing on ethics. The AI Alignment problem is traditionally seen as something which is prominently something technical. While a large, large portion of it is technical, the end towards which the technical AI is aimed or the ethics which is imbued within it or embodied within it is still an open and difficult question. Broadly, just to have everything defined here, we can understand ethics here just a method of seeking to understand what we ought to do and what counts as moral or good.
The end goal of AI safety is to create beneficial intelligence not undirected intelligence. What beneficial exactly entails is still an open question that largely exist in the domain of ethics. Even if all the technical issues surrounding the creation of an artificial general intelligence or super intelligence are solved, we will still face deeply challenging ethical questions that will have tremendous consequences for earth-originating intelligent life. This is what is meant when it is said that we must do philosophy or ethics on a deadline. In the spirit of that, that’s why we’re going to be focusing this podcast today on metaethics and particularly the metaethics of David Pearce and Brian Tomasik, which also happen to be ethical views which are popular I would say among people interested in the AI safety community.
I think that Brian and David have enough disagreements that this should be pretty interesting. Again, just going back to this idea of ethics, I think given this goal, ethics can be seen as a lens through which to view safe AI design. It’s also a cognitive architecture to potentially be instantiated in AI through machine ethics. That would potentially make AIs ethical reasoners, ethical decision-makers, or both. Ethics can also be developed, practiced and embodied by AI researchers and their collaborators, and can also be seen as a discipline through which we can guide AI research and adjudicate it’s impacts in the world.
There is an ongoing debate about what the best path forward is for generating ethical AI, whether it’s project of machine ethics through bottom up or for top down approaches, or just a broad project of AI safety and AI safety engineering where we seek out corrigibility and docility, and alignment, and security in machine systems or probably even some combination of the two. It’s unclear what the outcome of AI will be but what is more certain though is that AI promises to produce and make relevant both age-old and novel moral considerations through areas such as algorithmic bias and technological disemployment and autonomous weapons, and privacy, big data systems, and even possible phenomenal states in machines.
We’ll even see new ethical issues with what might potentially one day be super intelligence and beyond. Given this, I think I’d like to just dive in first with you Brian and then, with you David. If you could just get into what the foundation is of your moral view? Then, afterwards, we can dive into the metaethics behind it.
Brian: Sure. At bottom, the reason that I placed foremost priority on suffering is emotion. Basically, the emotional experience of having suffered myself intensely from time to time and having empathy when I see others suffering intensely. That experience of either feeling it yourself or seeing others in extreme pain carries just a moral valence to me or a spiritual sensation you might call it that seems different from the sensation that I feel from anything else. It seems just obvious at an emotional level that say torture or being eaten alive by a predatory animal or things of that nature have more moral urgency than anything else. That’s the fundamental basis. You can also try to make theoretical arguments to come to the same conclusion. For example, people have tried to advance what’s called the asymmetry, which is the intuition that it’s bad to create a new being who will suffer a lot but it’s not wrong to fail to create a being that will be happy or at least not nearly as wrong.
From that perspective, you might care more about preventing the creation of suffering beings than about creating additional happy beings. You can also advance the idea that maybe preferences are always a negative debt that has to be repaid. Maybe when you have a preference that’s a bad thing and then, it’s only by fulfilling the preference that you erase the bad things. This would be similar to the way in which Buddhism says that suffering arises from craving. The goal is to cease the cravings which can be done either through the fulfilling the cravings, giving the organism what the organism wants or not having the cravings in the first place. Those are some potential theoretical frameworks from which to also derive a suffering-focused ethical view. For me personally, the emotional feeling is the most important basis.
David: I would very much like to echo what Brian was saying there. I mean there is something about the nature of intense suffering. One can’t communicate it to someone who hasn’t suffered. I mean someone who is for example born with congenital anesthesia or insensitivity to pain but there is something that is self-intimatingly nasty and disvaluable about suffering. However, evolution hasn’t engineered us of course to care impartially about the suffering of all sentient beings. My suffering and those of my genetic kin tends to matter far more to me than anything else. So far as we aspire to become transhuman and posthuman, we should be aspiring to this godlike perspective that takes into account the suffering of all sentient beings that the egocentric illusionist is a genetically adaptive lie.
How does this tie in to the question of posthuman super intelligence? Of course, there are very different conceptions of what posthuman super intelligence is going to be. I’ve always had what might say a more traditional conception of super intelligence in which posthuman super intelligence is going to be our biological descendants enhanced by AI but nonetheless still our descendants. However, there are what might crudely be called two other conceptions of post human super intelligence. One is this Kurzweilian fusion of humans and our machines, such that the difference between humans and our machine ceases to be relevant.
There’s another conception of super intelligence that you might say in some ways is the most radical is the intelligence explosion that was first conceived by I.J. Good but has been developed by Eliezer Yudkowsky, MIRI, and most recently by Nick Bostrom that conceives of some kind of runaway explosion, recursively self-improving AI and yes, there being no guarantee that the upshot of this intelligence explosion is going to be in any way congenial to human values as we understand them. I’m personally skeptic about the intelligence explosion in this sense but yeah, it’s worth clarifying what one means by posthuman super intelligence.
Lucas: Wonderful. Right before we dive into the metaethics behind these views and their potential relationship with AI alignment and just broadening the discussion to include ethics and exploring some of these key terms. I just like to touch on the main branches of ethics to provide some context and mapping for us. Generally, ethics is understood to have three branches, those being metaethics, normative ethics, and applied ethics. Traditionally, applied ethics is viewed as the application of normative and metaethical views to specific cases and situations to determine the moral status of said case or situation in order to decide what ought to be done.
An example of that might be applying one’s moral views to factory farming to determine whether or not it is okay to factory farm animals for their meat. The next branch moving upwards in abstraction would be normative ethics, which examines and deconstructs or constructs the principles and ethical systems we use for assessing the moral worth and permissibility of specific actions and situations. This branch is traditionally viewed as the formal ethical structures that we apply to certain situations and people are familiar with the deontological ethics and consequentialism, or utilitarianism, or virtue ethics. These are all normative ethical systems.
What we’ll be discussing today is primarily metaethics. metaethics seeks to understand morality and ethics itself. It seeks to understand the nature of ethical statements, attitudes, motivation, properties and judgments. It seeks to understand whether or not ethics relates to objective truths about the world and about people, or whether it’s just simply subjective or if all ethical statements are in fact false. Seeks to understand when people mean when they express ethical judgments or statements. This gets into things like ethical uncertainty and justification theories, and substantial theories, and semantic theories of ethics.
Obviously, these are all the intricacies of the end towards which AI maybe aimed. Given even the epistemology of metaethics and ethics in general that also have major implications for what AIs might be able to discover about ethics or what they may not be able to discover about ethics. Again today, we’ll just be focusing on metaethics and the metaethics behind David and Brian’s views. I guess just to structure this a little bit, just to really start to use the formal language of metaethics. As a little bit of background again, semantic theories is an ethics seek to address the question of what is the linguistic meaning of moral terms or judgments.
These are primarily concerned with whether or not moral statements contain truth values or are arbitrary and subjective. There are other branches within semantic theories but there are main two branches. The first of that is noncognitivism. Noncognitivism refers to a group of theories which hold that moral statements are neither true nor false because they do not express genuine propositions. Usually, these forms of noncognitive views with things like emotivism where people think that when people are expressing our moral views or attitudes like suffering is wrong, they’re simply saying an emotion like boohoo it’s a suffering. Or I’m expressing the emotion that I think that suffering merely bothers me or is bad to me. Rather than you expressing some sort of truth or false claim about the world. Standing in contrast to noncognitivism is just cognitivism, which refers to a set of theories which hold that moral sentences express genuine propositions. That means that they can have truth of false values.
This is to say that they are capable of being true or false. Turning back to Brian and David’s views, how would you each view your moral positions as you’ve expressed thus far. Would you hold yourself to a cognitivist view or a noncognitivist view. I guess we can start with you David.
David: Yes. I just say it’s just built into the nature of let’s say agony that agony is disvaluable. Now, you might say that there is nothing in the equations of physics and science that says anything over and above the experience itself, something like redness. Yeah, redness is subjective. It’s mind-dependent. Yet, unless one thinks minds don’t exist in the physical universe. Nonetheless, redness is an objective feature of the natural physical world. I would say that for reasons we simply don’t understand, pleasure-pain axis discloses the world’s inbuilt metric of value and disvalue. It’s not an open question whether something like agony is disvaluable to the victim.
Now, of course, someone might say, “Well, yes. Agony is disvaluable to you but it’s not disvaluable to me.” I would say that this reflects an epistemological limitation and that in so far as you can access what it is like to be me and I’m in agony, then you will appreciate why agony is objectively disvaluable.
Lucas: Right. The view here is a cognitivist view where you think that it is true to say that there is some intrinsic property or quality to suffering or joy that makes it I guess analytically true that it is valuable or disvaluable.
David: Yes. Well, it has to be very careful about using something like analytically because yeah, someone says that god is talking to me and it is analytically true that these voices are the voices of god. Yeah, one needs to be careful not to smuggle in too much. It is indeed very mysterious. What could be this hybrid descriptive evaluative state of finding something valuable or disvaluable. The intrinsic nature of the physical is very much an open question. I think there are good powerful reasons for thinking that the reality is exhaustively described by the equations of physics. The intrinsic nature of that stuff, the essence of the physical, the fire in the equations is controversial. Physics itself is silent.
Lucas: Right. I guess here, you would describe yourself given these views as a moral realist or an objectivist.
David: Yes, yes.
Brian: Just to jump in before we get to me. Couldn’t you say that your view is still based on mind-dependence because at least based on the thing about if somebody else were hooked up to you, that person would appreciate the badness of suffering. That’s still just dependent on that other mind’s judgment or even if you have somebody who could mind meld with the whole universe and experience all suffering at once. That would still be the dependence of that mind. That mind is judging it to be a bad thing. Isn’t it still mind-depending ultimately?
David: Mind-dependent but I would say that minds are features of the physical world and so, obviously one can argue for some kind of dualism but I’m monistic physicalist at least that’s my working assumption.
Brian: I think objective moral value usually … the definition is usually that it’s not mind-dependent. Although, maybe it just depends what definition we’re using.
David: Yes. It’s rather like something physicalism, it’s often used as a stylistic variant of materialism. One can be non-materialist physicalist and idealist. As I said, minds are objective features of the physical world. I mean at least tentatively at any rate taks seriously the idea that our experience discloses the intrinsic nature of the physical. This is obviously controversial opinion. It’s associated with someone like Galen Straussen or more likely Phil Goff but it stretches back via Grover Maxwell and Russell, ultimately to Schopenhauer. A much more conventional view of course would be that the intrinsic nature of the physical, the fire and the equations is non-experiential. Then, at sometime during the late pre-Cambrian, something happened. Not just organizational but ontological eruption into the fabric of the world first person experience.
Lucas: Just to echo what Brian was saying. The traditional objectivist or more realist view is that the way in which science is the project of interrogating third person facts like what is simply true about the person regardless of what we think about it. In some ways, I think that traditionally the moral realist view is that if morality deals with objective facts, then, these facts are third person objectively true and can be discovered through the methods and tools of ethics. In the same way that someone who might be a mathematical realist would say that one does not invent certain geometric objects rather one discovers them through the application of mathematical reasoning and logic.
David: Yes. I think it’s very tempting to think of first person facts as having some kind of second rate ontological status but as far as I’m concerned, first person facts are real. If someone is in agony or experiencing redness, these are objective tracks about the physical world.
Lucas: Brian, would you just like to jump in with the metaethics behind your own view that you discussed earlier?
Brian: Sure. On cognitivism versus noncognitivism, I don’t have strong opinions because I think some of the debate is just about how people use language, which is not a metaphysical fundamental issue. It’s just like however humans happen to use language. I think the answer to the cognitivism, noncognitivism, if I had to say something would be it’s messy probably. Humans do talk about moral statements, the way they talk about other statements, other factual statements. We use reasoning and we care about maintaining logical consistency among sets of moral statements. We treat them as regular factual statements in that regard. There maybe also be a sense in which moral statements do strongly express certain emotions. I think probably most people don’t really think about it too much.
It’s like people know what they mean when they use moral statements and they don’t have a strong theory of exactly how to describe what they mean. One analogy that you could use is I think moral statements are like swear words. They’re used to make people feel more strongly about something or express how strongly you feel about something. People think that they don’t just refer to one’s emotions and even at a subjective level. If you say my moral view is suffering as bad. That feels different than saying I like ice cream because there’s a deeper, more spiritual or more like fundamental sensation that comes along with the moral statements that doesn’t come along with the, “I like ice cream,” statements.
I think metaphysically, that doesn’t reflect anything fundamental. It just means that we feel differently about moral statements and thoughts than about nonmoral ones. Subjectively, it feels different. Yeah. I think most people just feel that difference and then, exactly how you cash out whether that’s cognitive or noncognitive is a semantic dispute. My metaphysical position is anti-realism. I think that moral statements are mind-dependent. They reflect ultimately our own preferences even if they maybe very spiritual and like deep fundamental preferences. I think Occam’s Razor favors this view because it would add complexity to the world for there to be independent truths. I’m not even sure what that would mean, based on similar reason, I reject mathematical truths and anything non-physicalist. I think moral truths, mathematical truths and so on can all be thought of as fictional constructions that we make. We can reason within these fictional universes of ethics and mathematics that we construct using physical thought processes. That’s my basic metaphysical stance.
Lucas: Just stepping back to the cognitivism and noncognitivism issue, I guess I was specifically interested in yourself. When you were expressing your own moral view earlier, did you find that it’s simply a mixture of expressing your own emotions and also, trying to express truth claims or given your anti-realism, do you think that you’re simply only expressing emotions when you’re conveying your moral view?
Brian: I think very much of myself as an emotivist. It’s very clear to me that what I’m doing when I do ethics is what the emotivist as people are doing. Yes, since I don’t believe in moral truth, it would not make sense for me to be gesturing at moral truths. Except maybe in so far as my low level brain wiring intuitively thinks in those terms.
David: Just to add to this and that although it is possible to imagine, say something you like spectrum inversion, color inversion, some people who like ice cream and some people who hate ice cream. One thing it isn’t possible to do is imagine a civilization in which an inverted pleasure-pain axis. It seems to just be a basic fact about the world that unbearable, agony and despair is experienced as disvaluable and even cases that might appear to contradict this slight that say that masochist are in fact merely confirm a claim because, yeah, I mean the masochist enjoys the intensity rewarding release of endogenous opioids when the masochist undergoes activities that might otherwise be humiliating or painful.
Lucas: Right. David, it seems you’re making a claim about there being a perfect convergence in the space of all possible minds among the pleasure-pain axis having the same sort of function. I guess I’m potentially just missing the gap or pointing out the gap between that and I guess your cognitivist objectivism?
David: It seems to be built into the nature of let’s say agony or despair itself that it is disvaluable. It’s not I’m in agony. Is this valuable or not? It’s not open question whereas anything else. However, abhorrent, your eye might regard it one can still treat it as an open question and ask, is child abuse or slavery really disvaluable? Whereas in the case of agony, it’s built in the nature of the experience itself.
Lucas: I can get behind that. I think that sometimes when I’m feeling less nihilistic about morality, I am committed to that view. I think just to push back a little bit here. I think in the space of all possible minds, I think I can imagine a mind which has a moral judgment and commitment to the maximization of suffering within itself and within the world. It’s simply … it’s perfect in that sense. It’s perfect in maximizing suffering for itself in the world and it’s judgment and moral epistemology is very brittle, such that it will never change or deviate from this. How would you deal with something like that?
David: Is it possible? I mean one can certainly imagine a culture in which displays of machismo and the ability to cope with great suffering are highly valued and would be conspicuously displayed. This would fitness enhancing but nonetheless, it doesn’t really challenge the sovereignty of their pleasure-pain axis as the axis of value and disvalue. Yeah, I would struggle to conceive some kind of intelligence that values its own despair or agony.
Brian: From my perspective, I agree with what Lucas is saying depending on how you define things. One definition of suffering could be that part of the definition is desire to avoid it. From that perspective, you could say it’s not possible for an agent to seek something that it avoids. I think you could have systems where there are different parts in conflict so you could a hedonic assessment system that outputs a signal that this is suffering but then, another system then chooses to favor the suffering. Humans even have something like this when we can override our own suffering. We might have hedonic systems that say going out in the cold is painful but then, we have other systems or other signals that override that avoidance response and cause us to go out in the cold anyway for the sake of something else. You could imagine the wiring, such that wasn’t just enduring pain for some greater good but the motivational system was actively seeking to cause the hedonic system more experiences of pain. It’s just that that would be highly nonadaptive so we don’t see that anywhere in nature.
David: I would agree with what Brian says there. Yes, very much so.
Lucas: Okay. Given these views, would you guys have expressed and starting to get a better sense of them. Another branch of metaethics here that we might be able to explore how it fits in with your guy’s theories, justification theories within metaethics. These are attempts at understanding moral epistemology and motivation for acting in accordance with morality. It attempts to answer the question of how are moral judgments to be supported or defended? If possible, how does one make moral progress? This again will include moral epistemology and in terms of AI and value alignment, if one is anti-realist as Brian is or if one is an objectivist as David is then this completely changes the way and path forward towards AI alignment and value alignment if we are realist as David is then a sufficiently robust and correct moral epistemology in an AI system could essentially realize the hedonistic imperative as David sees it, where you would just have an optimization process extending out from planet earth, which was maximizing for the objectively good hedonic states in all possible sentient beings. I guess it’s a little unclear for me how this fits in with David’s theory or how David’s theory would be implemented.
David: There is a real problem with any theory of value that makes sovereign either the minimization of suffering or classical utilitarianism. Both Buddhism and negative utilitarianism appear to have this apocalyptic implication that if overriding responsibilities to minimize suffering but no. Isn’t that cleanest, quickest, efficient way to eliminate suffering to sterilize the planet, which is now technically feasible and though one can in theory imagine cosmic rescue missions if there is sentence elsewhere. There is apparently this not so disguised apocalyptic implication. When Buddha says allegedly or hopefully I teach one thing and one thing only. Suffering and the relief of suffering, or the end of suffering, yeah, in his day, there was no way to destroy the world. Today, there is.
Much less discussed, indeed I haven’t seen it adequately or not discussed at all in the scholarly literature is that a disguised implication of a classical utilitarian ethic that gives this symmetry to pleasure and pain is that we ought to be launching something like utilitronium shockwave where utilitronium is matter and energy optimized for pure bliss. The shockwave alludes to its velocity of propagation. Though humans perhaps are extremely unlikely even if and when we’re in a position to do so to launch a utilitronium shockwave. If one imagines a notional artificial, super intelligent with a utility function of classical utilitarianism, why wouldn’t that super intelligent launch a utilitronium shockwave that maximizes the cosmic abundance of positive value within our cosmological horizon.
Personally, I would imagine a future of gradients of intelligent bliss. I think that is in fact sociologically highly likely that post-human civilization will have a hedonic range that’s very crudely and schematically as is minus 10 to zero, to plus 10. I can imagine future civilization of let’s say plus 70 to plus 100 or plus 90 to a plus 100. From the perspective classical utilitarianism and classical utilitarianism is arguably the dominant some kind of watered-down version at least is the dominant secular ethic, and academia and elsewhere. That kind of civilization is suboptimal. It’s not moral or apparently has this obligation to launch this kind of cosmic orgasm so to speak.
Lucas: Right. I mean I think just pushing a little bit back on the first thing that you said there about the very negative scenario, which I think people tend to see as an implication of a suffering reducing focused ethic where there can’t be any suffering if there’s no sentient beings. That to me isn’t very plausible because it discounts the possibility of future wellbeing. I take the view that we actually do have a moral responsibility to create more happy beings and I view a symmetry between pain and suffering. I don’t have a particularly suffering-focused ethic where I think there’s asymmetry where I think we should alleviate suffering prior to maximizing wellbeing. I guess David, maybe you can just unpack a little bit before we jump into these justification theories about whether or not you view there as being asymmetry between suffering and wellbeing.
David: I think there’s an asymmetry. There’s this fable of Ursula Le Guin, short story, Ones Who Walk Away From Omelas. We’re invited to imagine this city of delights, vast city of incredible wonderful pleasures but the existence of Omelas, this city of delights depends on the torment and abuse of a single child. The question is would you walk away from Omelas and what does walking away from Omelas entail. Now, personally I am someone who would walk away from Omelas. The world does not have an off switch, an off button and I think if one is whether a Buddhist of a negative utilitarian, or someone who believes in suffering-focused ethics, rather than to consider these theoretical apocalyptic scenarios it is more fruitful to work with secular and religious life lovers to phase out the biology of suffering in favor of gradients of intelligent wellbeing because one of the advantages of hedonic recalibration, i.e. ratcheting up hedonic set points is that it doesn’t ask people to give up their existing values and preferences with complications.
If you ask me, just convenient, this is a rather trivial example. Imagine, 100 people, 100 different football teams. There’s simply no way to reconcile conflicting preferences but what one can do if one ratchets up everyone’s hedonic set point is to improve quality of life. By focusing on ratcheting up hedonic set points rather than trying to reconcile the irreconcilable, I think this is the potential way forward.
Brian: There are a lot of different points to comment on. I agree with David that negative utilitarians should not aim for world destruction for several reasons. One being that it would be make people turn against the cause of suffering reduction. It’s important to have other people not regard that as something to be appalled by. For example, animal rights terrorists, plausibly give the animal rights movement a pretty bad name and may set back the cause of animal rights by doing that. Negative utilitarians would almost certainly not succeed anyway, so the most likely outcome is that they hurt their own cause.
As far as David’s suggestion of improving wellbeing to reduce disagreements among competing football teams, I think that would potentially help giving people greater wealth and equality in society can reduce some tensions. I think there will always be some insatiable appetites especially from moral theories. For example, classical utilitarian has an insatiable appetite for computational resources. Egoists and other moral people may have their own insatiable appetites. We see that in the case of humans trying to acquire wealth beyond what is necessary for their own happiness. I think there will always be those agents who want to acquire as many resources as possible. The power maximizers will tend to acquire power. I think we still have additional issues of coordination and social science being used to control the thirst for power among certain segments of society.
Lucas: Sorry. Just to get this clear. It sounds like you guys are both committed to different forms of hedonic consequentialism. You’re bringing up preferences and other sorts of things. Is there a room for ultimate metaphysical value of preferences within your ethics? Or are preferences simply epistemically and functionally useful indicators of what will often lead to positive hedonics and agents within you guys as ethical theories?
Brian: Personally, I care to some degree about both preferences and hedonic wellbeing. Currently, I care some more about hedonic wellbeing just based on … from my meta-ethical standpoint, it’s ultimately my choice, what I want to care about. I happen to care a lot about hedonic suffering when I imagine that. From a different standpoint, you can argue that ultimately the golden rule for example commits you to caring about whatever it is and other organisms cares about whether that’s hedonic wellbeing or some arbitrary wish. For example, a deathbed wish would be a good example of a preference that doesn’t have hedonic content to it, whether you think it’s important to keep deathbed wishes even after a person has died ignoring side effects in terms of later generations realizing that promises are not being kept.
I think even ignoring those side effects, a deathbed wish does have some moral importance based on the idea that if I had a deathbed wish, I would strongly want it to be carried out if you are acting the way you want others to treat you. Then, you should care to some degree about other people’s deathbed wishes. Since I’m more emotionally compelled by extreme hedonic pain, that’s what I give the most weight to.
Lucas: What would your view be of an AI or machine intelligence, which has a very strong preference, whatever that computational architecture might look like a bit be flip one way rather than another. It just keeps flipping a bit back and forth, and then, you would have a preference utilitronium shockwave going out in the world. It seems intuitive to me also that we only care about preferences and so far as they … I guess this previous example does this work for me is that we only care about preferences in so far as that they have hedonic effects. I’ll bite the bullet on the deathbed wish thing and I think that ignoring side effects like if someone wishes for something and then, they die, I don’t think that we need to actually carry it out if we don’t think it will maximize hedonic wellbeing.
Brian: Ignoring the side effects. There are probably good hedonistic reasons to fulfill deathbed wishes so that current people will not be afraid that their wishes won’t be kept also. As far as the bit flipping, I think a bit flipping agent does, I think it’s preference does have moral significance but I weigh organisms in proportion to the sophistication of their minds. I care more about a single human than a single ant for example because a human has more sophisticated cognitive machinery. It can do more kinds of … have more kinds of thoughts about its own mental states. When a human has a preference, there’s more stuff going on within its brain to back that up so to speak. A very simple computer program that has a very simple preference to flip a bit doesn’t matter very much to me because there’s not a lot of substance behind that preferences. You could think of it as an extremely simple mind.
Lucas: What if it’s a super intelligence that wants to keep flipping bits?
Brian: In that case, I would give a significant way because it has so much substance in its mind. It probably has lots of internal processes that are reflecting on its own welfare so to speak. Yeah, if it’s a very sophisticated mind, I would give that significant weight. It might not override the preferences of seven billion humans combined. I tend to give less than linear weight to larger brains. As the size of the brain increases, I don’t scale the moral weight of the organism exactly linearly. That would alter reduce that utility monster inclusion.
Lucas: Given Brian’s metaethics being an anti-realist and viewing him as an emotivist, I guess the reasons or arguments that you could provide against this view would only be, they don’t refer back to any metaphysical objective, anything really. David, wouldn’t you say that in the end, it would just be your personal emotional choice whether or not to find something compelling here.
David: It’s to do with the nature of first person facts. What is it that the equations of physics ultimately describe and if you think subjectivity or at least take it seriously the conjecture of that subjectivity is the essence of the physical, the fire in the equations, then yeah, it’s just objectively in the case that first person agony is disvaluable. Here we get into some very controversial issues. I would just like to go back to one thing Brian was saying about sophistication. I don’t think it’s plausible that let’s say a pilot whale is more cognitively sophisticated than humans but it’s very much an open question whether a pilot whale with a substantially larger brain, substantially larger neocortex, substantially larger pain and pleasure centers that the intensity of experience undergone by a pilot whale let’s say may be greater than that of humans. Therefore, other things being equal, I would say that it’s so profoundly aversive states undergone by the whale matter more than a human. It’s not the level of sophistication or complexity that counts.
Lucas: Do you want to unpack a little bit your view about the hedonics versus the preferences, and whether or not preferences have any weight in your view?
David: Only indirectly weight and that ultimately, yeah, as I said I think what matters is the pleasure-pain axis and preferences only matter in so far as they impact that. Thanks to natural selection, we have countless millions and billions of preferences that are being manufactured all the time as social primates countless preferences conflict with each other. There is simply no way to reconcile a lot of them. Whereas one can continue to enrich and enhance wellbeing so, yeah sure. Other things being equal satisfy people’s preferences. In so many contexts, it is logically impossible to do so from politics, the middle east, interpersonal relationships, the people’s desire to be the world famous this, that or the other. It is logically impossible to satisfy a vast number of preferences.
Lucas: I think it would be interesting and useful to dive into, within justification theories, like moral epistemology and ethical motivation. I think I want to turn to Brian now. Brian, I’m so curious to know if it’s possible given your view of anti-realism and suffering focused ethics, whether or not you can make moral progress or what it means to make moral progress. How does one navigate the realm of moral issues in your view, given the metaethics that you hold? Why ought I or others, or why not ought I or others to follow your ethics or not?
Brian: Moral progress I think can be thought of as many people have a desire to improve their own moral views using standards of improvement that they choose. For example, a common standard would be I think that the moral views that I will hold after learning more, I will generally now defer to those views as the better ones. There might be some exceptions especially if you get too much into some subject area that distorts your thinking relative to the way it was before. Basically, you can think of brain state changes as either being approved of or not approved of by the current state. Moral progress would consist of doing updates to your brain that you approve of, like installing updates to computer that you choose to install.
That’s what moral progress would be. Basically, you designated which changes do I want to happen and then, if those happen according to the rules then it’s on a progress relatively to what my current state thought. You can have failures of goal preservation. The example that Eliezer Yudkowsky gives is if you give Gandhi a pill that would make him want to kill people. He should not take it because that would change his goals in a way that his current goals don’t approve of. That would be moral anti-progress relative to Gandhi’s current goals. Yeah, that’s how I would think of it. Different people have different preferences about how much you can call preference idealization.
Preference idealization is the idea of imagining what preferences you would hold if you knew more, were smarter, had more experiences, and so on. Different people couldn’t want different amounts of preference idealization. There are some people who say I have almost no idea what I currently value and I want to defer that to an artificial intelligence to help me figure that out. In my case, it’s very clear to me that extreme suffering is what I want to continue to value and if I change from that stance, that would be a failure of goal preservation relative to my current values. There are still questions on which I do have significant uncertainty in a sense that I would defer to my future self.
For example, the question of how to weigh different brain complexities against each other is something where I still have significant uncertainty. The question of how much weight to give to what’s called higher order theory in consciousness versus first order theories basically how much you think that high level thoughts are an important component of what consciousness is. That’s an issue where I have significant moral uncertainty. There are issues where I want to learn more, think more about it, have more other people think about it before I make up my mind fully on what I think about that. Then, why should you hold my moral view? The real answer is because I want you to and I’ll try to come up with arguments to make it sound more convincing to you.
David: I find subjectivism troubling. I support my football team is Manchester United. I wouldn’t take a pill, less induced me to support Manchester City because that would subvert my values in some sense. Nonetheless, ultimately, support for Manchester United is arbitrary. It is a support for the reduction of suffering merely a kin to I once support lets say of Manchester United.
Brian: I think metaphysically, they’re the same. It feels very different. There’s more of a spiritual, like your whole being is behind reduction of suffering in the way that’s not true for football teams. Ultimately, there’s no metaphysical difference.
Intentional objects ultimately are arbitrary that natural selection has eschewed us a define certain intentional objects. This is philosophy jargon for the things we care about, whether it’s a football or politics, or anything. Nonetheless, it’s unlike these arbitrary intentional objects, it just seems to built into the nature of agony or despair that they are disvaluable. It’s simply not possible to instantiate such states and find it an open question whether they’re disvaluable or not.
Brian: I don’t know if we want to debate now but I think it is possible. I mean we already have examples of one organism who finds the suffering of another organism to be possibility valuable.
David: They are not mirror-touch synesthete. They do not accurately perceive what is going on and in so far as one does either as a mirror-touch synesthete or can do the equivalent of a Vulcan mind meld or something like that, one is not going to perceive the disvaluable as valuable. Its an epistemological limitation.
Brian: My objection to that is it depends how you hook up the wires between the two minds. Like if you hook up one person suffering to another person’s suffering, then the second person will say it’s also bad. If you hook up one person’s suffering neurons to another person’s pleasure neurons, then, the second person will say it’s good. It just depends how you hook up the wires.
David: It’s not all or nothing but if one is let’s say a mirror-touch synesthete today and someone’s, they stub their toe and you have an experience of pain, it’s simply not possible to take pleasure in their stubbing their toe. I think if one does have this notional god’s eye perspective, an impartial view from nowhere that one will act accordingly.
Brian: I disagree with that because I think you can always imagine just reversing the motivational wires so to speak. Just flip the wire that says this is bad. Flip it to saying this is good in terms of the agent’s motivation.
David: Right. Yes. I was trying to visualize what this would entail.
Brian: Even in a synesthete example, just imagine a brain where the same stimulus currently in normal humans, this stimulus triggers negative emotional responses just have the neurons hook up to the positive emotional responses instead.
David: Once again, wouldn’t this be an epistemological limitation rather than some deep metaphysical truth about the world?
Brian: Well, it depends how you define epistemology but you could be a psychopath where you correctly predict another organism’s behavior but you don’t care. You can have a difference between beliefs and motivations. The beliefs could correctly recognize this I think but the motivations could have the wires flipped such that there’s motivation to cause more of the suffering.
David: It’s just that I would say that the psychopath has an epistemological limitation in that the psychopath does not adequately take into account other perspectives. In that sense, psychopath lacks an adequate theory of mind. The psychopath is privileging one particular here and now over other here and nows, which is not metaphysically sustainable.
Brian: It might be a definitional dispute like whether you can consider having proper motivation to be part of epistemological accuracy or not. It seems that you’re saying if you’re not properly motivated to reduce … you don’t have proper epistemological access to it by definition.
David: Yes. One has to be extremely careful with using this term by definition. Yes. I would say that we are all to some degree sociopathic. One is quasi sociopathic to one’s future self for example and so far is one let’s say doesn’t prudently save but squanders money and stuff. We are far more psychopathic towards other sentient beings because one is failing to fully to take into account their perspective. It’s hardwired epistemological limitation. One thing I would very much agree with Brian on is moral uncertainty and being prepared to reflection and take into account other perspectives and allow for the possibility one can be wrong. It’s not always possible to have the luxury of moral reflection uncertainty.
If a kid is drowning, hopefully one that dashes into the water to save the kid. Is this the right thing to do? Well, what happens if the kid, this is the real story, happens to be a toddler grows up to the Adolf Hitler and plunges the world into war. One doesn’t know the long term consequences of one’s action. Wherever possible, yes, one urges reflection and caution in the context of a discussion or debate. One isn’t qualifying, one’s uncertainty, agnosticism carefully but in a more deliberative context perhaps of what one should certainly do so.
Lucas: Let’s just bring it a little bit back to the ethical epistemology behind and ethical motivation behind your hedonistic imperative given your objectivism. I guess here, it’d also be interesting to know if you could also explore key metaphysical uncertainties and physical uncertainties, and what more and how we might go about learning about the universe such that your view would be further informed.
David: Happy to launch into long spiel about my view. One thing I think it really is worth stressing is that one doesn’t need to buy into any form of utilitarianism or suffering-focused ethics to believe that we can and should phase out the biology of involuntary suffering. It’s common to all manner of secular and religious views that we should be other things being equal minimizing suffering reducing unnecessary suffering and this is one thing that technology, it could buy a technology allows us to do and support for something like universal access for implantation, genetic screening, phasing out factory farming and shutting slaughter houses, going on to essentially reprogram the biosphere.
It doesn’t involve a commitment to some particular one specific ethical or meta-ethical view. For something like pain-free surgery anesthesia, you don’t need to sign up for it to recognize it’s a good thing. I suppose my interest is very much in building bridges with other ethical traditions. Yeah, I am happy to go into some of my own personal views but I just don’t want to tie this idea that we can use bio-tech to get rid of suffering into anything quirky or idiosyncratic to me. I have a fair number of idiosyncratic views.
Lucas: It would be interesting if you’d explain whether or not you think that super intelligences or AGI will necessarily converge on what you view to be objective morality or if that is ultimately down to AI researchers to be very mindful of implementing.
David: I think there are real risk here when one starts speaking as though posthuman super intelligence is going to end up endorsing a version of one’s own views and values, which a priori ,if one thinks about, is extremely unlikely. I think too one needs to ask yeah, when I was talking about post human super intelligence, if post human super intelligence is biological descendants, I think post human super intelligence will have a recognizable descendant of pleasure-pain axis. I think it will be ratcheted up so that say experience below hedonic zero is impossible.
In that sense, I do see a convergence. By contrast, if one has a conception of post human super intelligence such that post human super intelligence may not be sentient, may not be experiential at all then, there is no guarantee that such a regime would be friendly to anything recognizably human in its values.
Lucas: The crux here there are different ways of doing value alignment and one such way is descriptively through a super intelligence being able to gain enough information about the set of all values that human beings have and say aligning to those or to some fraction of those or to some idealized version of those through something like a coherent extrapolated volition. Another one is where we embed a moral epistemology within the machine system, so that the machine becomes an ethical reasoner, almost a moral philosopher in its own right. It seems that given your objectivist ethics that with that moral epistemology, it would be able to converge on what is true. Do these different paths forward makes sense to you and/or it also seems that the role of mind melding seems to be very crucial and core to the realization of the correct ethics in your view?
David: With some people, their hearts sinks when the topic of machine consciousness crops up because they know it’s going to be a long inconclusive philosophical discussion and a shortage of any real empirical tests. Yeah, I will just state. I do not think a classical digital computer is capable of phenomenal binding, therefore it will not understand the nature of consciousness or pleasure and pain, and I see the emotion of value and disvalue is bound with the pleasure-pain axis. In that sense, I think what we’re calling machine artificial general intelligence, in one sense it’s invincibly ignorant. I know a lot of people would disagree with this description but if you think humans or at least some humans spend a lot of their time thinking about, talking about, exploring consciousness and it’s all varieties in some cases exploring psychedelia, what are we doing? There are vast range of cognitive domains that are completely, cognitively inaccessible to digital computers.
Lucas: Putting aside the issue of machine consciousness, it seems that being able to first-person access hedonic states provides a extremely foundational and core motivational or at least epistemological role in your ethics David.
David: Yes. I mean part of intelligence involves being able to distinguish the important from the trivial, which ultimately as far as I can see boils down to the pleasure-pain axis. Digital zombies have no conception of what is important or what is trivial I would say.
Lucas: Why would that be if a true zombie in the David Chalmers sense is functionally isomorphic to a human. Presumably that zombie would properly care about suffering because all of its functional behavior is the same. Do you think in the real world, digital computers can’t do the same functional computation that a human brain does?
David: None of us have the slightest idea how one would set about programming a computer to do the kinds of things that humans are doing when they talk about and discuss consciousness when they take psychedelics or discuss the nature of the self. I’m not saying work arounds are impossible. I just don’t think they’re spontaneously going to happen.
Brian: I agree. Just like building intelligence itself, it requires a lot of engineering to create those features of humanlike psychology.
Lucas: I don’t see why it would be physically or technically impossible to instantiate an emulation of that architecture or an architecture that’s basically identical to it in a machine system. I don’t understand why computer architecture, computer substrate is really so different from biological architecture or substrate such that it’s impossible for this case.
David: It’s whether one feels the force of the binding problem or not. The example one can give, imagine the population of the USA are skull bound minds, imagine them implementing any kind of computation you like. They are ultra fast, electromagnetic signaling far faster than the retro chemical signaling and the CNS is normally conceived. Nonetheless, short of a breakdown with monistic physicalism, there is simply no way that the population of the USA is spontaneously going to become subject to experience to apprehend perceptual objects. Essentially, all you have is a micro experiential zombie. The question is why are 86 billion odd membrane bound supposedly classical neurons any different?
Why aren’t we micro experiential zombies? One way to appreciate, i think, the force, the adaptive role of phenomenal binding is to look at syndromes where binding even harshly breaks down such as simultanagnosia where the subject can only see one thing at once. Or motion blindness or akinetopsia, where one can’t apprehend motion or severe forms of schizophrenia where there is no longer any unitary self. Somehow right now, you instantiate a unitary world simulation populated by multiple phenomenally bound dynamical objects and this is tremendously fitness enhancing.
The question is how can a bunch of membrane-bound nerve cells, a pack of neurons carry out what is classically impossible. I mean one can probe the CNS with temporary course grained and neuro scans… individual feature process, edge detectors, motion detectors, color detectors. Apparently, there are no perceptual objects there. How is it that right now that your mind/brain is capable of running this egocentric world simulation in almost real time. It’s astonishing computational feat. I argue for a version of quantum mind but one needn’t buy into this to recognize that it’s profound an unsolved problem. I mean why aren’t we like the population of the USA?
Lucas: Just to bring this back to the AI alignment problem and putting aside issues in phenomenal binding, and consciousness for a moment. Putting aside also the conception that super intelligence is likely to be some sort of biologic instantiation if we imagine the more AI safety mainstream approach, the MIRI idea of there being simply a machine super intelligence. It seems that in your view David and I think here this elucidates a lot of the interdependencies and difficulties where one’s meta-ethical views are intertwined in the end with what is true about consciousness and computation. It seems that in your view, close to or almost maybe perhaps impossible to actually do AI alignment or value alignment on machine super intelligence.
David: It is possible to do value alignment but I think the real worry is that if you take the MIRI scenario seriously, this recursively self-improving software that will somehow … This runaway intelligence. There’s no knowing where it may lead by MIRI as far as I know have very different conception of the nature of consciousness and value. I’m not aware that they tackle the binding problem. I just don’t see that unitary subjects of experience or values, or pleasure-pain axis are spontaneously going to emerge from software. It seems to involve some form of strong emergence.
Lucas: Right. I guess to tie this back and ground it a bit. It seems that the portion of your metaethics, which is going to be informed by empirical facts about consciousness and minds in general is the view in there that without access to the phenomenal pleasure-pain axis, what you view to have an intrinsic goodness or wrongness to it because it is foundationally and physically, and objectively the pleasure-pain axis of the universe, the heat and the spark in the equation I guess as you say. Without access to that, then ultimately, one will go awry in one’s ethics if one does not have access to phenomenal hedonic states given that that’s the core of value.
David: Yeah. In theory, an intelligent digital computer stroke robot could impartially pave the cosmos with either dolorium or hedonium without actually understanding the implications of what it was doing. Hedonium being or utilitronium, matter and energy optimized for pure bliss. Dolorium being matter and energy optimized for, lack of a better word, for pure misery or despair. That’s the system in question we do not understand the implications of what it was doing. That I know a lot of people do think that well, sooner or later, classical, digital computers, our machines are going to wake up. I don’t think it’s going to happen. Rather we’re not talking about hypothetical quantum computers next century and beyond. Simply an expansion of today’s programmable digital computers. I think they’re zombies and will remain zombies.
Lucas: Fully autonomous agents which are very free and super intelligent in relation to us will in your view require a fundamental access to that which is valuable, which is phenomenal states, which is the phenomenal pleasure-pain axis. Without that, it’s missing its key epistemological ingredient. It will fail in value alignment.
David: Yes, yeah, yeah. It just simply does not understand the nature of the world. It’s rather like claiming where the system is intelligent but doesn’t understand the second or of thermodynamics. It’s not a full spectrum super intelligence.
Lucas: I guess my open question there would be then, whether or not it would be possible to not have access to fundamental hedonic states but still be something of a Bodhisattva with a robust moral epistemology that was heading in the right direction or what might be objective.
David: The system in question would not understand the implications of what it was doing.
Lucas: Right. It wouldn’t understand the implications but if it got set off in that direction and it was simply achieving the goal, then I think in some cases we might call that value aligned.
David: Yes. One can imagine … Sorry Brian. Do intervene when you’re ready but yeah, one could imagine for example being skeptical of the possibility of interstellar travel for biological humans but programming systems to go out across the cosmos or at least within our cosmological horizons and convert matter and energy into pure bliss. I mean one needn’t assume that this will apply to our little bubble of civilization but watch if we do about inert matter and energy elsewhere in the galaxy. One can leave it as it is or if one is let’s say a classical utilitarian, one could convert it into pure bliss. Yeah, one can send out probes. One could restructure, reprogram matter and energy in that way.
That would be a kind of compromise solution in one sense. Keep complexity within our little tiny bubble of civilization but convert the rest of the accessible cosmos into pure bliss. Though that technically would not strictly speaking maximize the abundance of positive value in our hubble volume, nonetheless it could become extraordinarily close to it from a classical utilitarian perspective.
Lucas: Brian, do you have anything to add here?
Brian: While I disagree on many, many points, I think digital computation is capable of functionally similar enough processing as the brain does. Even that weren’t the case, a paperclip maximizer with a very different architecture would still have a very sophisticated model of human emotions and its motivations wouldn’t be hooked up to those emotions but it would understand for all other sense of the word understand human pleasure and pain. Yeah, I see it more as a challenge of hooking up the motivation properly. As far as my thoughts on alignment in general based on my metaethics, I tend to agree with the default approach like the MIRI approach, which is unsurprising because MIRI is also anti-realist on metaethics. That approach sees the task as taking human values and somehow translating them into the AI and so that could be in a variety of different ways learning human values implicitly from certain examples or with some combination of maybe top down programming of certain ethical axioms.
That could send to exactly how you do alignment and there are lots of approaches to that. The basic idea that you need to specifically replicate the complexity of human values in machines and the complexity of the way humans reason. It won’t be there by default in any way shared between my opinion and that of the mainstream AI alignment approach.
Lucas: Do you take a view then similar to that of coherent extrapolated volition?
Brian: In case anybody doesn’t know, coherent extrapolated volition is Eliezer Yudkowsky’s idea of giving the AI the meta … You could call it a metaethics. It’s a meta rule for learning values to take humanity and think about what humanity want to want if it was smarter, knew, had more positive interactions with each other and thought faster and then, try to identify points of convergence among the values of different idealized humans. In terms of theoretical things to aim for, I think CEV is one reasonable target for reasons of cooperation among other humans. I mean if I controlled the world, I would prefer to have the AI implement my own values rather than humanities values because I care more about my values. Some human values are truly abhorrent to me and others seem at least unimportant to me.
In terms of getting everybody together to not fight endlessly over the outcome of AI in this theoretical scenario, CEV would be a reasonable target to strive for. In practice, I think that’s unrealistic like a pure CEV is unrealistic because the world does not listen to moral philosophers to any significant degree. In practice, things are determined by politics, economic power, technological and military power, and forces like that. Those determine most of what happens in the world. I think we may see approximations to CEV that are much more crude like you could say that democracy is an approximation to CEV in the sense that different people with different values, at least in theory, discuss their differences and then, come up with a compromise outcome.
Something like democracy maybe power-weighted democracy in which more powerful actors have more influence will be what ends up happening. The philosophers dream of idealizing values to perfection is unfortunately not going to happen. We can push in directions that are slightly more reflective. We can push aside towards slightly more reflection towards slightly more cooperation and things like that.
David: Couple of points that first, what to use an example we touched on before. What would be coherent extrapolated volition for all the world’s football supporters? Essentially, there’s simply no way to reconcile all their preferences. One may say that if they were fully informed football supporters, wouldn’t waste their time passionately supporting one team or another but essentially I’m not sure that the notion of coherent extrapolated volition there would make sense. Of course, there are more serious issues in football but the second thing when it comes to the nature of value, regardless of one’s metaphysical stance on whether one’s a realist or an anti-realist about value. I think it is possible by biotechnology to create states that are empirically, subjectively far more valuable than anything physiologically feasible today.
Take Prince Myshkin in Dostoevsky’s The Idiot. Like Dostoevsky was a temporal lobe epileptic and he said, “I would give my whole life for this one instant.” Essentially, there are states of consciousness that are empirically super valuable and rather than attempting to reconcile irreconcilable preferences, I think you could say that we should be and so far as we aspire to long term full spectrum super intelligence, perhaps we should be aiming to create these super valuable states. I’m not sure whether it’s really morally obligatory. I said my own focus is on the overriding importance of phasing out suffering but for someone who does give some weight or equal weight to positive experiences positively valuable experiences, that there is a vast range of valuable experience that is completely inaccessible to humans that could be engineered via biotechnology.
Lucas: A core difference here is going to be that given Brian’s view of anti-realism, AI alignment or value alignment would in the end be left to those powers which he described in order to resolve irreconcilable preferences. That is if human preferences don’t converge strongly enough after enough time and information that there are no longer irreconcilable preferences, which I guess I would suppose is probably wrong.
Brian: Which is wrong?
Lucas: That it would be wrong that human beings preferences would converge strongly enough that there would no longer be irreconcilable preferences after coherent extrapolated volition.
Brian: Okay, I agree.
Lucas: I’m saying that in the end, value alignment would be left up to economic forces, military forces, other forces to determine what comes out of value alignment. In David’s view, it would simply be down to if we could get the epistemology right and we could know enough about value and the pleasure-pain axis and the metaphysical status of phenomenal states that that would be value alignment would be to capitalize on that. I didn’t mean to interrupt you Brian. You want to jump in there?
Brian: I was going to say the same thing you did that I agree with David that there would be irreconcilable differences and in fact, many different parameters of the CEV algorithm would probably affect the outcome. One example that you could give is that people tend to crystallize their moral values as they age. You could imagine somebody who was presented with utilitarianism as a young person would be more inclined toward that whereas, maybe if that person haad been presented with deontology as a young person would the person would prefer deontology as he got older and so depending on seemingly arbitrary factors such as the order in which you are presented with moral views or what else is going on in your life at the time that you confront a given moral view or 100 other inputs. The output could be sensitive to that. CEV is really a class of algorithms depending on how you tune the parameters. You could get substantially different outcomes.
Yeah, CEV is an improvement even if there’s no obvious unique target. As I said, in practice, we won’t even get pure CEV but we’ll get some kind of very rough power-weighted approximation similar to our present world of democracy and competition among various interest groups for control.
Lucas: Just to explain how I’m feeling so far. I mean Brian, I’m very sympathetic to your view but I’m also very sympathetic to David’s view. I hover somewhere in between. I like this point that David made where he quoted Russell, something along the lines that one ought to be careful when discussing ethical metaphysics such that one is not simply trying to make one’s own views and preferences objective.
David: Yeah. When one is talking about well, just in general, when one speaks about the nature for example post human super intelligence, think of the way today that the very nature and notion of intelligence is a contested term. Simply sticking the words super in front of it is just how illuminating is it. When I read someone’s account of super intelligence, I’m really reading an account of what kind of person they are, their intellect and their values. I’m sure when I discuss the nature of full spectrum super intelligence, at least now I can see what I can’t the extent to which I’m simply articulating my own limitations.
Lucas: I guess for me here to get all my partialities out of the way, I hope that objectivism is true because I think that it makes the value alignment way less messy. In the end, we could have something actually good and beautiful, which I don’t know is some preference that I have that might be objective or not just simply wrong, or confused. The descriptive picture that I think Brian is committed to, which gives rise to the MIRI and Tomasik form of anti-realism is just one where in the beginning, there was entropy and noise and many generations of stars fusing atoms into heavier elements. One day one of these disks turn into a planet and a sun shone some light on a planet, and the planet began to produce people. There’s an optimization process there in the end, which simply seems to be ultimately driven by entropy and morality seems to simply a part of this optimization process, which just works to facilitate and mediate the relations between angry mean primates like ourselves.
Brian: I would point out there’s also a lot of spandrel to morality in my opinion, specially these days not that we’re not heavily optimized by biological pressures. All these conversation that we’re having right now is a spandrel in the sense that it’s just an outgrowth of certain abilities that we evolve but it’s not at all adaptive in any direct sense.
Lucas: Right. In this view, it really just seems like morality and suffering, and all of this is just byproduct of the screaming entropy and noise of whatever led to this universe. At the same time, the objective process and I think this is the part the people who are committed to MIRI anti-realism and I guess just relativism and skepticism about ethics in general, maybe are not tapping into enough. At the same time, this objectivity is producing a very real and objective phenomenal self and story, which is caught up in suffering where suffering is really suffering and really sucks to suffer. It all seems at face value true in that moment throughout the suffering that this is real. The suffering is real. The suffering is bad. It’s pretty horrible.
This bliss is something that I would never give up or if the rest of the universe were this bliss, that would just be the most amazing thing ever. In this very subjective phenomenal, I like just experiential thing that the universe produces, the subjective phenomenal story and narrative that we live. It seems there’s just this huge tension between that and I think the anti-realism, the clear suffering of suffering and just being a human being.
Brian: I’m not sure if there’s a tension because the anti-realist agrees that humans experience suffering as meaningful and they experience it as the most important thing imaginable. There’s not really a tension and you can explore why humans quest for objectivity. There seems to be certain glow that attaches to things by saying that they’re objectivity moral. That’s just a weird quirk of human brains. I would say that ultimately, we can choose to care about what we care about whether it’s subjective or not. I often say even if objective truth exist, I don’t necessarily care what it says because I care about what I care about. It could turn out that objective truth orders you to torture squirrels. If it does, then, I’m not going to follow the objective truth. On reflection, I’m not unsatisfied at all with anti-realism because what more could you want than what you want.
Lucas: David, feel free to jump in if you’d like.
David: Well, there it’s just … there’s this temptation to oscillate between two senses of the words subjective. Subjective in neither truth nor false, and subject in the sense of first-person experience. My being in agony or you’re being in agony or someone being in despair is as I said as much an objective property of reality as the rest mass of the electron. I mean what we can be doing is working in such ways as to increase the theory to maximize the amount of subjective value in the world regardless of whether or not one believes that this has any transcendent significance with the proviso here that there is a risk that if one aims strictly speaking to maximize subjective value, that one gets the utilitronium shockwave. If one is as I said, what I personally advocate as aiming for a civilization of super intelligent bliss one is not asking people to give up their core values and preferences unless one of those core values and preferences is to keep hedonic set points unchanged. That’s not very intellectually satisfying but it’s … this idea if one is working towards some kind of census, compromise.
Lucas: I think now I want to get into a bit more just about ethical uncertainty and specifically with regards to meta-ethical uncertainty. I think that just given the kinds of people that we are, that even if we disagree about realism versus anti-realism or ascribe different probabilities to each view. We might pretty strongly converge on how we ought to do value alignment given our kinds of moral considerations that we have. I’m just curious to explore a little bit more about what you guys are most uncertain about what it would take to change your mind? What new information you would be looking for that might challenge or make you revise your metaethical view? How we might want to proceed with AI alignment given our metaethical uncertainty?
Brian: Can you do those one by one?
Lucas: Yeah, for sure. If I can remember everything I just said. First to start off, what do you guys most uncertain about within your meta-ethical theories?
Brian: I’m not very uncertain meta-ethically. I can’t actually think of what would convince me to change my metaethics because as I said, even if it turned out that metaphysically moral truth was a thing out there in some way whatever that would mean, I wouldn’t care about it except for like instrumental reasons. For example, if it was a god, then you’d have to instrumentally care about god punishing you or something but in terms of what I actually care about, it would be not connected to moral truth. Yeah, I would have to be some sort of revision of the way I conceive of my own values. I’m not sure what that would look like to be meta-ethically uncertain.
Lucas: There’s a branch of metaethics, which has to tackle this issue of meta-ethical commitment or moral commitment to meta-ethical views. If some sort of meta-ethical thing is true, why ought I to follow what is metaethically true? In your view Brian, it is just simply why ought you not to follow or why ought it not matter for you to follow what is meta-ethically true if there ends up being objective moral facts.
Brian: The squirrel example is a good illustration if ethics turned out to be, you must torture as many squirrels as possible. Then, screw moral truth. I don’t see what this abstract metaphysical thing has to do with what I care about myself. Basically, my ethics comes from empathy, seeing others in pain, wanting that to stop. Unless moral truths somehow gives insight about that, like maybe moral truths is somehow based on that kind of empathy, sophisticated way then, it would be another person giving me thoughts on morality. The metaphysical nature of it would be irrelevant. It would only be useful in so far as it would appeal to my own emotions and sense of what morality should be for me.
David: If I might interject. Undercutting my position and negative utilitarianism and suffering-focus ethics, I think it quietly likely that posthuman super intelligence, advance civilization with a hedonic range ratcheted right up to 70 to 100 or something like that. We’d look back on anyone articulating the kind of view that I am, that anyone who believes in suffering-focused ethics does and seeing it as some kind of depressive psychosis while intuitively assumes that our successes will be wiser than we are and perhaps, well they will be in many ways. Yet in another sense, I think we should be aspiring to ignorance that once we have done absolutely everything in our power to minimize mitigate, abolish and prevent suffering, I think we should forget it even existed. I hope that eventually any experience below hedonic zero will be literally inconceivable.
Lucas: Just to jump to you here David. What are your views about what you are most meta-ethically uncertain about?
David: It’s this worry that what one is doing however much one is pronouncing about the nature of reality, or the future of intelligence life in the universe and so on. What one is really doing is some kind of disguised autobiography. Given that quite a number of people sadly pain and suffering have loomed larger in my life than pleasure, turning this into deep metaphysical truth about the universe. This potentially undercuts my view. As I said, I think there are arguments against the symmetry view that suffering is self-intimatingly bad where there is nothing self-intimatingly bad about being insentient system or a system that it’s really content. Nonetheless, yeah, I take seriously the possibility that’s all I’m doing is expressing obliquely by own limitations of perspective.
Lucas: Given these uncertainties and the difficulty and expected impact of AI alignment, if we’re again committing ourselves to this MIRI view of an intelligence explosion with quickly recursive self-improving AI systems, how would you both, if you were the king of AI strategy, how would you go about allocating your metaethics and how would you go about working on the AI alignment problem and thinking about the strategy given your uncertainties and your views?
Brian: I should mention that my most probable scenario for AI is a slow take off in which lots of components of intelligence emerge piece by piece rather than a localized intelligence explosion. As far as the intelligence like if it were a hard take off localized intelligence explosion, then, yeah I think the diversity approaches that people are considering is what I would do as well. It seems to me, you have to somehow learn values because in the same way that we’ve discovered that teaching machines by learning is more powerful than teaching them by hard coding rules. You probably have to mostly learn values as well. Although, there might be hard coding mixed in. Yeah, I would just pursue a variety of approaches and the way that the current community is doing.
I support the fact that there is also a diversity of short term versus long term focus. Some people are working on concrete problems. Others are focusing on issues like decision theory and logical uncertainty and so on because I think some of those foundational issues will be very important. For example, decision theory could make a huge difference to the AI’s effectiveness as well as issues of what happens in conflict situations. Yeah, I think a diversity of approaches is valuable. I don’t have a specific advice on when I would recommend tweaking current approaches. I guess I expected that the concrete problems work will mostly be done automatically by industry because those are the kinds of problems that you need to make AI work at all. If anything, I might invest more in the kind of long-term approaches that practical applications are likely to ignore or at least put off until later.
David: Yes, because of my background assumptions are different, it’s hard for me to deal with your question. If one believes that subjects of experience that could suffer could simply emerge at different levels of abstraction, I don’t really know how to tackle this because this strikes me as a form of strong emergence. One of the reasons why philosophers don’t like strong emergence is that essentially, all bets are off. Yeah, you imagine if life hadn’t been reducible to molecular biology and hence, ultimately to content chemistry and physics. Yeah, I’m not probably the best person to answer your question.
I think in terms of real moral focus, I would like to see essentially the molecular signature of unpleasant experience identified and essentially, you’re just making it completely off limits and biologically impossible for any sentient being to suffer. If one also believes that there are or could be subjects of experience that somehow emerge in classical digital computers, then, yeah, I’m floundering my theory of mind and reality would be wrong.
Lucas: I think touching on the paper that Kaj Sotala had written on suffering risks, I think that a lot of different value systems would also converge with you on your view David. Whether or not we take the view of realism or anti-realism, I think that most people would agree with you. I think the issue comes about with again, preference conflicts where some people I think even this might be a widespread view in catholicism where you view suffering as really important because it teaches you things and/or it has some special metaphysical significance with relation to god. Within the anti-realism view, with Brian’s view, I would find it very… just dealing with varying preferences on whether or not we should be able to suffer is something I just don’t want to deal with.
Brian: Yeah, that illustrates what I was saying about I prefer my values over the collective values of humanity. That’s one example.
David: I don’t think it would be disputed that sometimes suffering can teach lessons. The question is are there any lessons that couldn’t be functionally replaced by something else. This idea that we can just offload the nasty side of life on to software. In the case of pain, nociception one knows that yeah, so they brought software systems can be program or trained up to avoid noxious stimuli without the nasty raw feels should we be doing the same thing for organic biological robots too. When it comes to this, the question of suffering, one can have quite fierce and lively disputes with someone who says that yeah, they want to retain the capacity to suffer. This is very different from involuntary suffering. I think that quite often someone can see that no, they wouldn’t want to force another sentient being to suffer against their will. It should be a matter of choice.
Lucas: To tie this all into AI alignment again, really the point of this conversation is that again, we’re doing ethics on a deadline. If you survey the top 100 AI safety researchers or AI researches in the world, you’ll see that they give a probability distribution of the likelihood of human level artificial intelligence with about a 50% probability at 2050. This, many suspect, will have enormous implications for earth originating-intelligent life and our cosmic endowment. Our normative and descriptive and applied ethical practices that we engage with are all embodiments and consequential to the sorts of meta-ethical views, which we hold, which may not even be explicit. I think many people don’t really think about metaethics very much. I think that many AI researchers probably don’t think about metaethics very much.
The end towards which AI will be aimed will largely be a consequence of some aggregate of meta-ethical views and assumptions or the meta-ethical views and assumptions of a select few. I guess Brian and David, just to tie this all together, what do you guys view as really the practicality of metaethics in general and in terms of technology and AI alignment.
Brian: As far as what you said about metaethics determining the outcome, I would say maybe the implicit metaethics will determine the outcome but I think as we discuss before, 90 some percent of the outcome will be determined by ordinary economic and political forces. Most people in politics in general don’t think about metaethics explicitly but they still engage in the process and have a big impact on the outcome. I think the same will be true in AI alignment. People will push for things they want to push for and that’ll mostly determine what happens. It’s possible that metaethics could inspire people to be more cooperative depending on how it’s framed. CEV as a practical metaethics could potentially inspire cooperation if it’s seen as an ideal to work towards, although the extent to which it can actually be achieve is questionable.
Sometimes, you might have a naïve view where a moral realist assumes that a super intelligent AI would necessarily converge to the moral truth or at least a super intelligent AI could identify the moral truth and then, maybe all you need to do is program the AI to care about the moral truth once it discovers it. Those particular naïve approaches, I think would produce the wrong outcomes because there would be no moral truth to be found. I think it’s important to be wary of that assumption that a super intelligence will figure it out on its own and we don’t need to do the hard work of loading complex human values ourselves. It seems like the current AI alignment community largely recognizes that they recognize that there’s a lot of hard work in loading values and it won’t just happen automatically.
David: In terms of metaethics, consider the nature of pain-free surgery, surgical anesthesia. When it was first introduced in the mid 19th century, it was for about 15 years controversial. There were powerful voices who spoke against it but nonetheless, very rapidly a consensus emerge and we all now, almost all take it for granted for major surgery anesthesia. It didn’t require a consensus on the nature of value and metaethics. It’s just this is the obvious given our nature. Clearly, I would hope that eventually something similar will happen not just for physical pain but also psychological pain too. Just as we now take it for granted that it was the right thing to do to eradicate smallpox, no one is seriously suggesting that we bring smallpox back and it doesn’t depend on consensus on metaethics.
I would hope that the experience below hedonic zero, which one can possibly we’ll be able to find its precise molecular signature. I hope that consensus will emerge that we should phase it out too. Sorry, this isn’t much in the way of practical guidance to today’s roboticist and AI researchers but I suppose I’m just expressing my hope here.
Lucas: No, I think I share that. I think that we have to do ethics on a deadline but I think that there are certain ethical things whose deadline is much longer or which doesn’t necessarily have a real concrete deadline. I like… with your example of the pain anesthesia drugs.
Brian: In my view, metaethics is mostly useful for people like us or other philosophers and effective altruists who can inform our own advocacy. We want to figure out what we care about and then, we go for it and push for that. Then, maybe to some extent, it may diffuse through society in certain ways but in the start, it’s just helping us figure out what we want to push for.
Lucas: There’s an extent to which the evolution of human civilization has also been an evolution of metaethical views, which are consciously or unconsciously being developed. Brian, your view is simply that 90% of what has causal efficacy over what happens in the end are going to be like military and economic and just like raw optimization forces that work on this planet.
Brian: Also, politics and memetic spandrels. For example, like people talk about the rise of postmodernism as replacement of metaethical realism with anti-realism in popular culture. I think that is a real development. One can question to what extent, it matters. Maybe it’s correlated with things like a decline in religiosity which matters more. I think that is one good example of how metaethics can actually go popular and mainstream.
Lucas: Right. Just to bring this back, I think that in terms of the AI alignment problem, I think I try to or at least I’d like to be a bit more optimistic about how much causal efficacy each part of thinking has causal efficacy over the AI alignment problem. I like to not or I tend not to think that 90% of it will in the end be due to rogue impersonal forces like you’re discussing. I think that everyone no matter who you are stands to gain from more metaethical thinking in so far as that whether you take realist or anti-realist views. The expression of your values or whatever you think your values might be whether they might be conventional or relative, or arbitrary in your view, or whether they might relate to some objectivity. They’re much likely less to be expressed and I think a reasonable in a good way, without sufficient metaethical thinking and discussion.
David: One thing I would very much hope that before for example, radiating out across the cosmos, we would sort out our problems on earth in the solar system first regardless of whether one is secular or religious, or a classical or a negative utilitarian, let’s not start thinking about colonizing nearby solar systems or anything there. Yeah, if one is an optimist or maybe thinking of opportunities forgone but at least wait a few centuries. I think in a fundamental sense, we do not understand the nature of reality and not understanding the nature of reality comes with not understanding the nature of value and disvalue or the experience of value and disvalue as Brian might put it.
Brian: Unfortunately, I’m more pessimistic than David. I think the forces of expansion will be hard to stop as they always have been historically. Nuclear weapons are something that almost everybody wishes hadn’t been developed and yet they were developed. Climate change is something that people would like to stop but it has a force of its own due to the difficulty of coordination. I think the same will be true for space colonization and AI development as well that we can make tweaks around the edges but the large trajectory will be determined by the runaway economic and technological situation that we find ourselves in.
David: I fear Brian maybe right. I used to sometimes think about the possibilities of so-called cosmic rescue missions if the rare earth hypothesis is false and suffering Darwinian life exists within our cosmological horizon. I used to imagine this idea that we would radiate out and prevent suffering elsewhere. A, I suspect the rare earth hypothesis is true but B, I suspect even if for suffering life forms do exist elsewhere within our hubble volume. It’s probably more likely humans or our successes would go out and just create more suffering or it’s a rather dark and pessimistic view in my more optimistic moments I think we will phase out suffering all together in the next few centuries but these are guesses really.
Lucas: We’re dealing with ultimately given AI and it being the most powerful optimization process or the seed optimization process to radiate out from earth. I mean we’re dealing with potential astronomical waste or astronomical value, or astronomical disvalue and if we tie this again into moral uncertainty and start thinking about William MacAskill’s work on moral uncertainty where we just do what might be like expected value calculations with regards to our moral uncertainty. We’ve tried to be very mathematical about it and consider the amount of matter and energy that we are dealing with here. Given a super intelligent optimization process coming from Earth.
I think that tying this all together and considering it all should potentially plan an important role in our AI strategy. I definitely feel very sympathetic to Brian’s views that in the end, it might all simply come down to these impersonal economic and political, and militaristic, and memetic forces which exist. Given moral uncertainty, given meta-ethical uncertainty and given the amount of matter and energy that is at stake, potentially some portion of AI strategy should play into circumventing those forces or trying to get around them or decrease them and their effects and hold on AI alignment.
Brian: Yeah. I think it’s tweaks around the edges as I said unless these approaches become very mainstream but I think the prior probability that AI alignment of the type that you would hope for becomes worldwide is low because the prior probability that any given thing becomes worldwide mainstream is low. You can certainly influence local communities who share those ideals and they can try to influence things to the extent possible.
Lucas: Right. I mean maybe something potentially more sinister is that it doesn’t need to become worldwide if there’s a singleton scenario or if the power and control over the AI is very small within a tiny organization or some smaller organization which has power in autonomy to do this kind of thing.
Brian: Yeah, I guess I would again say the probability that you will influence those people would be low. Personally, I would imagine it would be either within a government or a large corporation. Maybe we have disproportionate impact on AI developers relative to the average human. Especially as AI becomes more powerful, I would expect more and more actors to try to have an influence. Our proportional influence would decline.
Lucas: Well, I’m feel very pessimistic after all this. Morality is not real and everything’s probably going to shit because economics and politics is going to drive it all in the end, huh?
David: It’s also possible that we’re heading for a glorious future of super human bliss beyond the bounds of every day experience and that this is just the fag end of Darwinian life.
Lucas: All right. David, we’ll be having I think as you say one day we might have thoughts as beautiful as sunsets.
David: What a beautiful note to end on.
Lucas: I hope that one day we have thoughts as beautiful as sunsets and that suffering is a thing of the past whether that be objective or subjective within the context of an empty cold universe of just entropy. Great. Well, thank you so much Brian and David. Do you guys have any more questions or anything you’d like to say or any plugs, last minute things?
Brian: Yeah, I’m interested in promoting research on how you should tweak AI trajectories if you are foremost concerned about suffering. A lot of this work is being done by the Foundational Research Institute, which aims to avert s-risks especially as they are related to AI. I would encourage people interested in futurism to think about suffering scenarios in addition to extinction scenarios. Also, people who are interested in suffering-focused ethics to become more interested in futurism and thinking about how they can affect long-term trajectories.
David: Visit my websites urging the use of biotechnology to phase out suffering in favor of gradients of intelligent bliss for all sentient beings. I’d also like just to say yeah, thank you Lucas for this podcast and all the work that you’re doing.
Brian: Yeah, thanks for having us on.
Lucas: Yeah, thank you. Two Bodhisattvas if I’ve ever met them.
David: If only.
Lucas: Thanks so much guys.
If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.