Posts in this category get featured at the top of the front page.

AI Alignment Podcast: The Metaethics of Joy, Suffering, and Artificial Intelligence with Brian Tomasik and David Pearce

What role does metaethics play in AI alignment and safety? How might paths to AI alignment change given different metaethical views? How do issues in moral epistemology, motivation, and justification affect value alignment? What might be the metaphysical status of suffering and pleasure?  What’s the difference between moral realism and anti-realism and how is each view grounded?  And just what does any of this really have to do with AI?

The Metaethics of Joy, Suffering, and AI Alignment is the fourth podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, currently rebranded Humanity+. You might know him for his work on The Hedonistic Imperative, a book focusing on our moral obligation to work towards the abolition of suffering in all sentient life. Brian is a researcher at the Foundational Research Institute. He writes about ethics, animal welfare, and future scenarios on his website “Essays On Reducing Suffering.” 

Topics discussed in this episode include:

  • What metaethics is and how it ties into AI alignment or not
  • Brian and David’s ethics and metaethics
  • Moral realism vs antirealism
  • Emotivism
  • Moral epistemology and motivation
  • Different paths to and effects on AI alignment given different metaethics
  • Moral status of hedonic tones vs preferences
  • Can we make moral progress and what would this mean?
  • Moving forward given moral uncertainty
In this interview we discuss ideas contained in the work of Brian Tomasik and David Pearce. You can learn more about Brian’s work here and here, and David’s work hereYou can hear more in the podcast above or read the transcript below.

Lucas: Hey, everyone. Welcome back to the AI Alignment podcast series with the Future of Life Institute. Today, we’ll be speaking with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, rebranded humanity plus, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention.

Brian Tomasik writes about ethics, animal welfare and for far-future scenarios from a suffering-focused perspective on his website reducing-suffering.org. He has also helped found the Foundational Research Institute, which is a think tank that explores crucial considerations for reducing suffering in the long term future. If you have been finding this podcast interesting or useful, remember to follow us on your preferred listening platform and share the episode on social media. Today, Brian, David, and I speak about metaethics, key concepts and ideas in the space, explore the metaethics of Brian and David, and how this all relates to and is important for AI alignment. This was a super fun and interesting episode and I hope that you find it valuable. With that, I give you Brian Tomasik and David Pearce.

Thank you so much for coming on the podcast.

David: Thank you Lucas.

Brian: Glad to be here.

Lucas: Great. We can start off with you David and then, you Brian and just giving a little bit about your background, the intellectual journey that you’ve been on and how that brought you here today.

David: Yes. My focus has always been on the problem of suffering, very ancient problem, Buddhism and countless other traditions preoccupied by the problem of suffering. I’m also a transhumanist and what transhumanism brings to the problem is suffering is the idea that it’s possible to use technology, in particular biotechnology to phase out suffering, not just in humans throughout the living world and ideally replace them by gradients of intelligent wellbeing. Transhumanism is a very broad movement embracing not just radical mood enrichment but also super longevity and super intelligence. This is what brings me in and us here today in that there is no guarantee that human preoccupations are the problems of suffering are going to overlap with those of post human super intelligence.

Lucas: Awesome, and so you, Brian.

Brian: I’ve been interested in utilitarianism since I was 18 and I discovered the word. I immediately looked it up and was interested to see that the philosophy mirrored some of the things that I had been thinking about up to that point. I became interested in animal ethics and the far future. A year after that, I actually discovered David’s writings of the Hedonistic Imperative, along with other factors. His writings helped to inspire me to care more about suffering relative to the creation of happiness. Since then, I’ve been what you might call suffering-focused, which means I think that the reduction of suffering has more moral priority than other values. I’ve written about both animal ethics including wild animal suffering as well as risks of astronomical future suffering, what are called s-risks. You had a recent podcast episode with Kaj Sotala to talk about s-risks.

I, in general think that from my perspective, one important thing to think about was during AI is what sorts of outcomes could result in large amounts of suffering? We should try to steer away from those possible future scenarios.

Lucas: Given our focuses on AI alignment, I’d like to just offer a little bit of context. Today, this episode will be focusing on ethics. The AI Alignment problem is traditionally seen as something which is prominently something technical. While a large, large portion of it is technical, the end towards which the technical AI is aimed or the ethics which is imbued within it or embodied within it is still an open and difficult question. Broadly, just to have everything defined here, we can understand ethics here just a method of seeking to understand what we ought to do and what counts as moral or good.

The end goal of AI safety is to create beneficial intelligence not undirected intelligence. What beneficial exactly entails is still an open question that largely exist in the domain of ethics. Even if all the technical issues surrounding the creation of an artificial general intelligence or super intelligence are solved, we will still face deeply challenging ethical questions that will have tremendous consequences for earth-originating intelligent life. This is what is meant when it is said that we must do philosophy or ethics on a deadline. In the spirit of that, that’s why we’re going to be focusing this podcast today on metaethics and particularly the metaethics of David Pearce and Brian Tomasik, which also happen to be ethical views which are popular I would say among people interested in the AI safety community.

I think that Brian and David have enough disagreements that this should be pretty interesting. Again, just going back to this idea of ethics, I think given this goal, ethics can be seen as a lens through which to view safe AI design. It’s also a cognitive architecture to potentially be instantiated in AI through machine ethics. That would potentially make AIs ethical reasoners, ethical decision-makers, or both. Ethics can also be developed, practiced and embodied by AI researchers and their collaborators, and can also be seen as a discipline through which we can guide AI research and adjudicate it’s impacts in the world.

There is an ongoing debate about what the best path forward is for generating ethical AI, whether it’s project of machine ethics through bottom up or for top down approaches, or just a broad project of AI safety and AI safety engineering where we seek out corrigibility and docility, and alignment, and security in machine systems or probably even some combination of the two. It’s unclear what the outcome of AI will be but what is more certain though is that AI promises to produce and make relevant both age-old and novel moral considerations through areas such as algorithmic bias and technological disemployment and autonomous weapons, and privacy, big data systems, and even possible phenomenal states in machines.

We’ll even see new ethical issues with what might potentially one day be super intelligence and beyond. Given this, I think I’d like to just dive in first with you Brian and then, with you David. If you could just get into what the foundation is of your moral view? Then, afterwards, we can dive into the metaethics behind it.

Brian: Sure. At bottom, the reason that I placed foremost priority on suffering is emotion. Basically, the emotional experience of having suffered myself intensely from time to time and having empathy when I see others suffering intensely. That experience of either feeling it yourself or seeing others in extreme pain carries just a moral valence to me or a spiritual sensation you might call it that seems different from the sensation that I feel from anything else. It seems just obvious at an emotional level that say torture or being eaten alive by a predatory animal or things of that nature have more moral urgency than anything else. That’s the fundamental basis. You can also try to make theoretical arguments to come to the same conclusion. For example, people have tried to advance what’s called the asymmetry, which is the intuition that it’s bad to create a new being who will suffer a lot but it’s not wrong to fail to create a being that will be happy or at least not nearly as wrong.

From that perspective, you might care more about preventing the creation of suffering beings than about creating additional happy beings. You can also advance the idea that maybe preferences are always a negative debt that has to be repaid. Maybe when you have a preference that’s a bad thing and then, it’s only by fulfilling the preference that you erase the bad things. This would be similar to the way in which Buddhism says that suffering arises from craving. The goal is to cease the cravings which can be done either through the fulfilling the cravings, giving the organism what the organism wants or not having the cravings in the first place. Those are some potential theoretical frameworks from which to also derive a suffering-focused ethical view. For me personally, the emotional feeling is the most important basis.

David: I would very much like to echo what Brian was saying there. I mean there is something about the nature of intense suffering. One can’t communicate it to someone who hasn’t suffered. I mean someone who is for example born with congenital anesthesia or insensitivity to pain but there is something that is self-intimatingly nasty and disvaluable about suffering. However, evolution hasn’t engineered us of course to care impartially about the suffering of all sentient beings. My suffering and those of my genetic kin tends to matter far more to me than anything else. So far as we aspire to become transhuman and posthuman, we should be aspiring to this godlike perspective that takes into account the suffering of all sentient beings that the egocentric illusionist is a genetically adaptive lie.

How does this tie in to the question of posthuman super intelligence? Of course, there are very different conceptions of what posthuman super intelligence is going to be. I’ve always had what might say a more traditional conception of super intelligence in which posthuman super intelligence is going to be our biological descendants enhanced by AI but nonetheless still our descendants. However, there are what might crudely be called two other conceptions of post human super intelligence. One is this Kurzweilian fusion of humans and our machines, such that the difference between humans and our machine ceases to be relevant.

There’s another conception of super intelligence that you might say in some ways is the most radical is the intelligence explosion that was first conceived by I.J. Good but has been developed by Eliezer Yudkowsky, MIRI, and most recently by Nick Bostrom that conceives of some kind of runaway explosion, recursively self-improving AI and yes, there being no guarantee that the upshot of this intelligence explosion is going to be in any way congenial to human values as we understand them. I’m personally skeptic about the intelligence explosion in this sense but yeah, it’s worth clarifying what one means by posthuman super intelligence.

Lucas: Wonderful. Right before we dive into the metaethics behind these views and their potential relationship with AI alignment and just broadening the discussion to include ethics and exploring some of these key terms. I just like to touch on the main branches of ethics to provide some context and mapping for us. Generally, ethics is understood to have three branches, those being metaethics, normative ethics, and applied ethics. Traditionally, applied ethics is viewed as the application of normative and metaethical views to specific cases and situations to determine the moral status of said case or situation in order to decide what ought to be done.

An example of that might be applying one’s moral views to factory farming to determine whether or not it is okay to factory farm animals for their meat. The next branch moving upwards in abstraction would be normative ethics, which examines and deconstructs or constructs the principles and ethical systems we use for assessing the moral worth and permissibility of specific actions and situations. This branch is traditionally viewed as the formal ethical structures that we apply to certain situations and people are familiar with the deontological ethics and consequentialism, or utilitarianism, or virtue ethics. These are all normative ethical systems.

What we’ll be discussing today is primarily metaethics. metaethics seeks to understand morality and ethics itself. It seeks to understand the nature of ethical statements, attitudes, motivation, properties and judgments. It seeks to understand whether or not ethics relates to objective truths about the world and about people, or whether it’s just simply subjective or if all ethical statements are in fact false. Seeks to understand when people mean when they express ethical judgments or statements. This gets into things like ethical uncertainty and justification theories, and substantial theories, and semantic theories of ethics.

Obviously, these are all the intricacies of the end towards which AI maybe aimed. Given even the epistemology of metaethics and ethics in general that also have major implications for what AIs might be able to discover about ethics or what they may not be able to discover about ethics. Again today, we’ll just be focusing on metaethics and the metaethics behind David and Brian’s views. I guess just to structure this a little bit, just to really start to use the formal language of metaethics. As a little bit of background again, semantic theories is an ethics seek to address the question of what is the linguistic meaning of moral terms or judgments.

These are primarily concerned with whether or not moral statements contain truth values or are arbitrary and subjective. There are other branches within semantic theories but there are main two branches. The first of that is noncognitivism. Noncognitivism refers to a group of theories which hold that moral statements are neither true nor false because they do not express genuine propositions. Usually, these forms of noncognitive views with things like emotivism where people think that when people are expressing our moral views or attitudes like suffering is wrong, they’re simply saying an emotion like boohoo it’s a suffering. Or I’m expressing the emotion that I think that suffering merely bothers me or is bad to me. Rather than you expressing some sort of truth or false claim about the world. Standing in contrast to noncognitivism is just cognitivism, which refers to a set of theories which hold that moral sentences express genuine propositions. That means that they can have truth of false values.

This is to say that they are capable of being true or false. Turning back to Brian and David’s views, how would you each view your moral positions as you’ve expressed thus far. Would you hold yourself to a cognitivist view or a noncognitivist view. I guess we can start with you David.

David: Yes. I just say it’s just built into the nature of let’s say agony that agony is disvaluable. Now, you might say that there is nothing in the equations of physics and science that says anything over and above the experience itself, something like redness. Yeah, redness is subjective. It’s mind-dependent. Yet, unless one thinks minds don’t exist in the physical universe. Nonetheless, redness is an objective feature of the natural physical world. I would say that for reasons we simply don’t understand, pleasure-pain axis discloses the world’s inbuilt metric of value and disvalue. It’s not an open question whether something like agony is disvaluable to the victim.

Now, of course, someone might say, “Well, yes. Agony is disvaluable to you but it’s not disvaluable to me.” I would say that this reflects an epistemological limitation and that in so far as you can access what it is like to be me and I’m in agony, then you will appreciate why agony is objectively disvaluable.

Lucas: Right. The view here is a cognitivist view where you think that it is true to say that there is some intrinsic property or quality to suffering or joy that makes it I guess analytically true that it is valuable or disvaluable.

David: Yes. Well, it has to be very careful about using something like analytically because yeah, someone says that god is talking to me and it is analytically true that these voices are the voices of god. Yeah, one needs to be careful not to smuggle in too much. It is indeed very mysterious. What could be this hybrid descriptive evaluative state of finding something valuable or disvaluable. The intrinsic nature of the physical is very much an open question. I think there are good powerful reasons for thinking that the reality is exhaustively described by the equations of physics. The intrinsic nature of that stuff, the essence of the physical, the fire in the equations is controversial. Physics itself is silent.

Lucas: Right. I guess here, you would describe yourself given these views as a moral realist or an objectivist.

David: Yes, yes.

Brian: Just to jump in before we get to me. Couldn’t you say that your view is still based on mind-dependence because at least based on the thing about if somebody else were hooked up to you, that person would appreciate the badness of suffering. That’s still just dependent on that other mind’s judgment or even if you have somebody who could mind meld with the whole universe and experience all suffering at once. That would still be the dependence of that mind. That mind is judging it to be a bad thing. Isn’t it still mind-depending ultimately?

David: Mind-dependent but I would say that minds are features of the physical world and so, obviously one can argue for some kind of dualism but I’m monistic physicalist at least that’s my working assumption.

Brian: I think objective moral value usually … the definition is usually that it’s not mind-dependent. Although, maybe it just depends what definition we’re using.

David: Yes. It’s rather like something physicalism, it’s often used as a stylistic variant of materialism. One can be non-materialist physicalist and idealist. As I said, minds are objective features of the physical world. I mean at least tentatively at any rate taks seriously the idea that our experience discloses the intrinsic nature of the physical. This is obviously controversial opinion. It’s associated with someone like Galen Straussen or more likely Phil Goff but it stretches back via Grover Maxwell and Russell, ultimately to Schopenhauer. A much more conventional view of course would be that the intrinsic nature of the physical, the fire and the equations is non-experiential. Then, at sometime during the late pre-Cambrian, something happened. Not just organizational but ontological eruption into the fabric of the world first person experience.

Lucas: Just to echo what Brian was saying. The traditional objectivist or more realist view is that the way in which science is the project of interrogating third person facts like what is simply true about the person regardless of what we think about it. In some ways, I think that traditionally the moral realist view is that if morality deals with objective facts, then, these facts are third person objectively true and can be discovered through the methods and tools of ethics. In the same way that someone who might be a mathematical realist would say that one does not invent certain geometric objects rather one discovers them through the application of mathematical reasoning and logic.

David: Yes. I think it’s very tempting to think of first person facts as having some kind of second rate ontological status but as far as I’m concerned, first person facts are real. If someone is in agony or experiencing redness, these are objective tracks about the physical world.

Lucas: Brian, would you just like to jump in with the metaethics behind your own view that you discussed earlier?

Brian: Sure. On cognitivism versus noncognitivism, I don’t have strong opinions because I think some of the debate is just about how people use language, which is not a metaphysical fundamental issue. It’s just like however humans happen to use language. I think the answer to the cognitivism, noncognitivism, if I had to say something would be it’s messy probably. Humans do talk about moral statements, the way they talk about other statements, other factual statements. We use reasoning and we care about maintaining logical consistency among sets of moral statements. We treat them as regular factual statements in that regard. There maybe also be a sense in which moral statements do strongly express certain emotions. I think probably most people don’t really think about it too much.

It’s like people know what they mean when they use moral statements and they don’t have a strong theory of exactly how to describe what they mean. One analogy that you could use is I think moral statements are like swear words. They’re used to make people feel more strongly about something or express how strongly you feel about something. People think that they don’t just refer to one’s emotions and even at a subjective level. If you say my moral view is suffering as bad. That feels different than saying I like ice cream because there’s a deeper, more spiritual or more like fundamental sensation that comes along with the moral statements that doesn’t come along with the, “I like ice cream,” statements.

I think metaphysically, that doesn’t reflect anything fundamental. It just means that we feel differently about moral statements and thoughts than about nonmoral ones. Subjectively, it feels different. Yeah. I think most people just feel that difference and then, exactly how you cash out whether that’s cognitive or noncognitive is a semantic dispute. My metaphysical position is anti-realism. I think that moral statements are mind-dependent. They reflect ultimately our own preferences even if they maybe very spiritual and like deep fundamental preferences. I think Occam’s Razor favors this view because it would add complexity to the world for there to be independent truths. I’m not even sure what that would mean, based on similar reason, I reject mathematical truths and anything non-physicalist. I think moral truths, mathematical truths and so on can all be thought of as fictional constructions that we make. We can reason within these fictional universes of ethics and mathematics that we construct using physical thought processes. That’s my basic metaphysical stance.

Lucas: Just stepping back to the cognitivism and noncognitivism issue, I guess I was specifically interested in yourself. When you were expressing your own moral view earlier, did you find that it’s simply a mixture of expressing your own emotions and also, trying to express truth claims or given your anti-realism, do you think that you’re simply only expressing emotions when you’re conveying your moral view?

Brian: I think very much of myself as an emotivist. It’s very clear to me that what I’m doing when I do ethics is what the emotivist as people are doing. Yes, since I don’t believe in moral truth, it would not make sense for me to be gesturing at moral truths. Except maybe in so far as my low level brain wiring intuitively thinks in those terms.

David: Just to add to this and that although it is possible to imagine, say something you like spectrum inversion, color inversion, some people who like ice cream and some people who hate ice cream. One thing it isn’t possible to do is imagine a civilization in which an inverted pleasure-pain axis. It seems to just be a basic fact about the world that unbearable, agony and despair is experienced as disvaluable and even cases that might appear to contradict this slight that say that masochist are in fact merely confirm a claim because, yeah, I mean the masochist enjoys the intensity rewarding release of endogenous opioids when the masochist undergoes activities that might otherwise be humiliating or painful.

Lucas: Right. David, it seems you’re making a claim about there being a perfect convergence in the space of all possible minds among the pleasure-pain axis having the same sort of function. I guess I’m potentially just missing the gap or pointing out the gap between that and I guess your cognitivist objectivism?

David: It seems to be built into the nature of let’s say agony or despair itself that it is disvaluable. It’s not I’m in agony. Is this valuable or not? It’s not open question whereas anything else. However, abhorrent, your eye might regard it one can still treat it as an open question and ask, is child abuse or slavery really disvaluable? Whereas in the case of agony, it’s built in the nature of the experience itself.

Lucas: I can get behind that. I think that sometimes when I’m feeling less nihilistic about morality, I am committed to that view. I think just to push back a little bit here. I think in the space of all possible minds, I think I can imagine a mind which has a moral judgment and commitment to the maximization of suffering within itself and within the world. It’s simply … it’s perfect in that sense. It’s perfect in maximizing suffering for itself in the world and it’s judgment and moral epistemology is very brittle, such that it will never change or deviate from this. How would you deal with something like that?

David: Is it possible? I mean one can certainly imagine a culture in which displays of machismo and the ability to cope with great suffering are highly valued and would be conspicuously displayed. This would fitness enhancing but nonetheless, it doesn’t really challenge the sovereignty of their pleasure-pain axis as the axis of value and disvalue. Yeah, I would struggle to conceive some kind of intelligence that values its own despair or agony.

Brian: From my perspective, I agree with what Lucas is saying depending on how you define things. One definition of suffering could be that part of the definition is desire to avoid it. From that perspective, you could say it’s not possible for an agent to seek something that it avoids. I think you could have systems where there are different parts in conflict so you could a hedonic assessment system that outputs a signal that this is suffering but then, another system then chooses to favor the suffering. Humans even have something like this when we can override our own suffering. We might have hedonic systems that say going out in the cold is painful but then, we have other systems or other signals that override that avoidance response and cause us to go out in the cold anyway for the sake of something else. You could imagine the wiring, such that wasn’t just enduring pain for some greater good but the motivational system was actively seeking to cause the hedonic system more experiences of pain. It’s just that that would be highly nonadaptive so we don’t see that anywhere in nature.

David: I would agree with what Brian says there. Yes, very much so.

Lucas: Okay. Given these views, would you guys have expressed and starting to get a better sense of them. Another branch of metaethics here that we might be able to explore how it fits in with your guy’s theories, justification theories within metaethics. These are attempts at understanding moral epistemology and motivation for acting in accordance with morality. It attempts to answer the question of how are moral judgments to be supported or defended? If possible, how does one make moral progress? This again will include moral epistemology and in terms of AI and value alignment, if one is anti-realist as Brian is or if one is an objectivist as David is then this completely changes the way and path forward towards AI alignment and value alignment if we are realist as David is then a sufficiently robust and correct moral epistemology in an AI system could essentially realize the hedonistic imperative as David sees it, where you would just have an optimization process extending out from planet earth, which was maximizing for the objectively good hedonic states in all possible sentient beings. I guess it’s a little unclear for me how this fits in with David’s theory or how David’s theory would be implemented.

David: There is a real problem with any theory of value that makes sovereign either the minimization of suffering or classical utilitarianism. Both Buddhism and negative utilitarianism appear to have this apocalyptic implication that if overriding responsibilities to minimize suffering but no. Isn’t that cleanest, quickest, efficient way to eliminate suffering to sterilize the planet, which is now technically feasible and though one can in theory imagine cosmic rescue missions if there is sentence elsewhere. There is apparently this not so disguised apocalyptic implication. When Buddha says allegedly or hopefully I teach one thing and one thing only. Suffering and the relief of suffering, or the end of suffering, yeah, in his day, there was no way to destroy the world. Today, there is.

Much less discussed, indeed I haven’t seen it adequately or not discussed at all in the scholarly literature is that a disguised implication of a classical utilitarian ethic that gives this symmetry to pleasure and pain is that we ought to be launching something like utilitronium shockwave where utilitronium is matter and energy optimized for pure bliss. The shockwave alludes to its velocity of propagation. Though humans perhaps are extremely unlikely even if and when we’re in a position to do so to launch a utilitronium shockwave. If one imagines a notional artificial, super intelligent with a utility function of classical utilitarianism, why wouldn’t that super intelligent launch a utilitronium shockwave that maximizes the cosmic abundance of positive value within our cosmological horizon.

Personally, I would imagine a future of gradients of intelligent bliss. I think that is in fact sociologically highly likely that post-human civilization will have a hedonic range that’s very crudely and schematically as is minus 10 to zero, to plus 10. I can imagine future civilization of let’s say plus 70 to plus 100 or plus 90 to a plus 100. From the perspective classical utilitarianism and classical utilitarianism is arguably the dominant some kind of watered-down version at least is the dominant secular ethic, and academia and elsewhere. That kind of civilization is suboptimal. It’s not moral or apparently has this obligation to launch this kind of cosmic orgasm so to speak.

Lucas: Right. I mean I think just pushing a little bit back on the first thing that you said there about the very negative scenario, which I think people tend to see as an implication of a suffering reducing focused ethic where there can’t be any suffering if there’s no sentient beings. That to me isn’t very plausible because it discounts the possibility of future wellbeing. I take the view that we actually do have a moral responsibility to create more happy beings and I view a  symmetry between pain and suffering. I don’t have a particularly suffering-focused ethic where I think there’s asymmetry where I think we should alleviate suffering prior to maximizing wellbeing. I guess David, maybe you can just unpack a little bit before we jump into these justification theories about whether or not you view there as being asymmetry between suffering and wellbeing.

David: I think there’s an asymmetry. There’s this fable of Ursula Le Guin, short story, Ones Who Walk Away From Omelas. We’re invited to imagine this city of delights, vast city of incredible wonderful pleasures but the existence of Omelas, this city of delights depends on the torment and abuse of a single child. The question is would you walk away from Omelas and what does walking away from Omelas entail. Now, personally I am someone who would walk away from Omelas. The world does not have an off switch, an off button and I think if one is whether a Buddhist of a negative utilitarian, or someone who believes in suffering-focused ethics, rather than to consider these theoretical apocalyptic scenarios it is more fruitful to work with secular and religious life lovers to phase out the biology of suffering in favor of gradients of intelligent wellbeing because one of the advantages of hedonic recalibration, i.e. ratcheting up hedonic set points is that it doesn’t ask people to give up their existing values and preferences with complications.

If you ask me, just convenient, this is a rather trivial example. Imagine, 100 people, 100 different football teams. There’s simply no way to reconcile conflicting preferences but what one can do if one ratchets up everyone’s hedonic set point is to improve quality of life. By focusing on ratcheting up hedonic set points rather than trying to reconcile the irreconcilable, I think this is the potential way forward.

Brian: There are a lot of different points to comment on. I agree with David that negative utilitarians should not aim for world destruction for several reasons. One being that it would be make people turn against the cause of suffering reduction. It’s important to have other people not regard that as something to be appalled by. For example, animal rights terrorists, plausibly give the animal rights movement a pretty bad name and may set back the cause of animal rights by doing that. Negative utilitarians would almost certainly not succeed anyway, so the most likely outcome is that they hurt their own cause.

As far as David’s suggestion of improving wellbeing to reduce disagreements among competing football teams, I think that would potentially help giving people greater wealth and equality in society can reduce some tensions. I think there will always be some insatiable appetites especially from moral theories. For example, classical utilitarian has an insatiable appetite for computational resources. Egoists and other moral people may have their own insatiable appetites. We see that in the case of humans trying to acquire wealth beyond what is necessary for their own happiness. I think there will always be those agents who want to acquire as many resources as possible. The power maximizers will tend to acquire power. I think we still have additional issues of coordination and social science being used to control the thirst for power among certain segments of society.

Lucas: Sorry. Just to get this clear. It sounds like you guys are both committed to different forms of hedonic consequentialism. You’re bringing up preferences and other sorts of things. Is there a room for ultimate metaphysical value of preferences within your ethics? Or are preferences simply epistemically and functionally useful indicators of what will often lead to positive hedonics and agents within you guys as ethical theories?

Brian: Personally, I care to some degree about both preferences and hedonic wellbeing. Currently, I care some more about hedonic wellbeing just based on … from my meta-ethical standpoint, it’s ultimately my choice, what I want to care about. I happen to care a lot about hedonic suffering when I imagine that. From a different standpoint, you can argue that ultimately the golden rule for example commits you to caring about whatever it is and other organisms cares about whether that’s hedonic wellbeing or some arbitrary wish. For example, a deathbed wish would be a good example of a preference that doesn’t have hedonic content to it, whether you think it’s important to keep deathbed wishes even after a person has died ignoring side effects in terms of later generations realizing that promises are not being kept.

I think even ignoring those side effects, a deathbed wish does have some moral importance based on the idea that if I had a deathbed wish, I would strongly want it to be carried out if you are acting the way you want others to treat you. Then, you should care to some degree about other people’s deathbed wishes. Since I’m more emotionally compelled by extreme hedonic pain, that’s what I give the most weight to.

Lucas: What would your view be of an AI or machine intelligence, which has a very strong preference, whatever that computational architecture might look like a bit be flip one way rather than another. It just keeps flipping a bit back and forth, and then, you would have a preference utilitronium shockwave going out in the world. It seems intuitive to me also that we only care about preferences and so far as they … I guess this previous example does this work for me is that we only care about preferences in so far as that they have hedonic effects. I’ll bite the bullet on the deathbed wish thing and I think that ignoring side effects like if someone wishes for something and then, they die, I don’t think that we need to actually carry it out if we don’t think it will maximize hedonic wellbeing.

Brian: Ignoring the side effects. There are probably good hedonistic reasons to fulfill deathbed wishes so that current people will not be afraid that their wishes won’t be kept also. As far as the bit flipping, I think a bit flipping agent does, I think it’s preference does have moral significance but I weigh organisms in proportion to the sophistication of their minds. I care more about a single human than a single ant for example because a human has more sophisticated cognitive machinery. It can do more kinds of … have more kinds of thoughts about its own mental states. When a human has a preference, there’s more stuff going on within its brain to back that up so to speak. A very simple computer program that has a very simple preference to flip a bit doesn’t matter very much to me because there’s not a lot of substance behind that preferences. You could think of it as an extremely simple mind.

Lucas: What if it’s a super intelligence that wants to keep flipping bits?

Brian: In that case, I would give a significant way because it has so much substance in its mind. It probably has lots of internal processes that are reflecting on its own welfare so to speak. Yeah, if it’s a very sophisticated mind, I would give that significant weight. It might not override the preferences of seven billion humans combined. I tend to give less than linear weight to larger brains. As the size of the brain increases, I don’t scale the moral weight of the organism exactly linearly. That would alter reduce that utility monster inclusion.

Lucas: Given Brian’s metaethics being an anti-realist and viewing him as an emotivist, I guess the reasons or arguments that you could provide against this view would only be, they don’t refer back to any metaphysical objective, anything really. David, wouldn’t you say that in the end, it would just be your personal emotional choice whether or not to find something compelling here.

David: It’s to do with the nature of first person facts. What is it that the equations of physics ultimately describe and if you think subjectivity or at least take it seriously the conjecture of that subjectivity is the essence of the physical, the fire in the equations, then yeah, it’s just objectively in the case that first person agony is disvaluable. Here we get into some very controversial issues. I would just like to go back to one thing Brian was saying about sophistication. I don’t think it’s plausible that let’s say a pilot whale is more cognitively sophisticated than humans but it’s very much an open question whether a pilot whale with a substantially larger brain, substantially larger neocortex, substantially larger pain and pleasure centers that the intensity of experience undergone by a pilot whale let’s say may be greater than that of humans. Therefore, other things being equal, I would say that it’s so profoundly aversive states undergone by the whale matter more than a human. It’s not the level of sophistication or complexity that counts.

Lucas: Do you want to unpack a little bit your view about the hedonics versus the preferences, and whether or not preferences have any weight in your view?

David: Only indirectly weight and that ultimately, yeah, as I said I think what matters is the pleasure-pain axis and preferences only matter in so far as they impact that. Thanks to natural selection, we have countless millions and billions of preferences that are being manufactured all the time as social primates countless preferences conflict with each other. There is simply no way to reconcile a lot of them. Whereas one can continue to enrich and enhance wellbeing so, yeah sure. Other things being equal satisfy people’s preferences. In so many contexts, it is logically impossible to do so from politics, the middle east, interpersonal relationships, the people’s desire to be the world famous this, that or the other. It is logically impossible to satisfy a vast number of preferences.

Lucas: I think it would be interesting and useful to dive into, within justification theories, like moral epistemology and ethical motivation. I think I want to turn to Brian now. Brian, I’m so curious to know if it’s possible given your view of anti-realism and suffering focused ethics, whether or not you can make moral progress or what it means to make moral progress. How does one navigate the realm of moral issues in your view, given the metaethics that you hold? Why ought I or others, or why not ought I or others to follow your ethics or not?

Brian: Moral progress I think can be thought of as many people have a desire to improve their own moral views using standards of improvement that they choose. For example, a common standard would be I think that the moral views that I will hold after learning more, I will generally now defer to those views as the better ones. There might be some exceptions especially if you get too much into some subject area that distorts your thinking relative to the way it was before. Basically, you can think of brain state changes as either being approved of or not approved of by the current state. Moral progress would consist of doing updates to your brain that you approve of, like installing updates to computer that you choose to install.

That’s what moral progress would be. Basically, you designated which changes do I want to happen and then, if those happen according to the rules then it’s on a progress relatively to what my current state thought. You can have failures of goal preservation. The example that Eliezer Yudkowsky gives is if you give Gandhi a pill that would make him want to kill people. He should not take it because that would change his goals in a way that his current goals don’t approve of. That would be moral anti-progress relative to Gandhi’s current goals. Yeah, that’s how I would think of it. Different people have different preferences about how much you can call preference idealization.

Preference idealization is the idea of imagining what preferences you would hold if you knew more, were smarter, had more experiences, and so on. Different people couldn’t want different amounts of preference idealization. There are some people who say I have almost no idea what I currently value and I want to defer that to an artificial intelligence to help me figure that out. In my case, it’s very clear to me that extreme suffering is what I want to continue to value and if I change from that stance, that would be a failure of goal preservation relative to my current values. There are still questions on which I do have significant uncertainty in a sense that I would defer to my future self.

For example, the question of how to weigh different brain complexities against each other is something where I still have significant uncertainty. The question of how much weight to give to what’s called higher order theory in consciousness versus first order theories basically how much you think that high level thoughts are an important component of what consciousness is. That’s an issue where I have significant moral uncertainty. There are issues where I want to learn more, think more about it, have more other people think about it before I make up my mind fully on what I think about that. Then, why should you hold my moral view? The real answer is because I want you to and I’ll try to come up with arguments to make it sound more convincing to you.

David: I find subjectivism troubling. I support my football team is Manchester United. I wouldn’t take a pill, less induced me to support Manchester City because that would subvert my values in some sense. Nonetheless, ultimately, support for Manchester United is arbitrary. It is a support for the reduction of suffering merely a kin to I once support lets say of Manchester United.

Brian: I think metaphysically, they’re the same. It feels very different. There’s more of a spiritual, like your whole being is behind reduction of suffering in the way that’s not true for football teams. Ultimately, there’s no metaphysical difference.

Intentional objects ultimately are arbitrary that natural selection has eschewed us a define certain intentional objects. This is philosophy jargon for the things we care about, whether it’s a football or politics, or anything. Nonetheless, it’s unlike these arbitrary intentional objects, it just seems to built into the nature of agony or despair that they are disvaluable. It’s simply not possible to instantiate such states and find it an open question whether they’re disvaluable or not.

Brian: I don’t know if we want to debate now but I think it is possible. I mean we already have examples of one organism who finds the suffering of another organism to be possibility valuable.

David: They are not mirror-touch synesthete. They do not accurately perceive what is going on and in so far as one does either as a mirror-touch synesthete or can do the equivalent of a Vulcan mind meld or something like that, one is not going to perceive the disvaluable as valuable. Its an epistemological limitation.

Brian: My objection to that is it depends how you hook up the wires between the two minds. Like if you hook up one person suffering to another person’s suffering, then the second person will say it’s also bad. If you hook up one person’s suffering neurons to another person’s pleasure neurons, then, the second person will say it’s good. It just depends how you hook up the wires.

David: It’s not all or nothing but if one is let’s say a mirror-touch synesthete today and someone’s, they stub their toe and you have an experience of pain, it’s simply not possible to take pleasure in their stubbing their toe. I think if one does have this notional god’s eye perspective, an impartial view from nowhere that one will act accordingly.

Brian: I disagree with that because I think you can always imagine just reversing the motivational wires so to speak. Just flip the wire that says this is bad. Flip it to saying this is good in terms of the agent’s motivation.

David: Right. Yes. I was trying to visualize what this would entail.

Brian: Even in a synesthete example, just imagine a brain where the same stimulus currently in normal humans, this stimulus triggers negative emotional responses just have the neurons hook up to the positive emotional responses instead.

David: Once again, wouldn’t this be an epistemological limitation rather than some deep metaphysical truth about the world?

Brian: Well, it depends how you define epistemology but you could be a psychopath where you correctly predict another organism’s behavior but you don’t care. You can have a difference between beliefs and motivations. The beliefs could correctly recognize this I think but the motivations could have the wires flipped such that there’s motivation to cause more of the suffering.

David: It’s just that I would say that the psychopath has an epistemological limitation in that the psychopath does not adequately take into account other perspectives. In that sense, psychopath lacks an adequate theory of mind. The psychopath is privileging one particular here and now over other here and nows, which is not metaphysically sustainable.

Brian: It might be a definitional dispute like whether you can consider having proper motivation to be part of epistemological accuracy or not. It seems that you’re saying if you’re not properly motivated to reduce … you don’t have proper epistemological access to it by definition.

David: Yes. One has to be extremely careful with using this term by definition. Yes. I would say that we are all to some degree sociopathic. One is quasi sociopathic to one’s future self for example and so far is one let’s say doesn’t prudently save but squanders money and stuff. We are far more psychopathic towards other sentient beings because one is failing to fully to take into account their perspective. It’s hardwired epistemological limitation. One thing I would very much agree with Brian on is moral uncertainty and being prepared to reflection and take into account other perspectives and allow for the possibility one can be wrong. It’s not always possible to have the luxury of moral reflection uncertainty.

If a kid is drowning, hopefully one that dashes into the water to save the kid. Is this the right thing to do? Well, what happens if the kid, this is the real story, happens to be a toddler grows up to the Adolf Hitler and plunges the world into war. One doesn’t know the long term consequences of one’s action. Wherever possible, yes, one urges reflection and caution in the context of a discussion or debate. One isn’t qualifying, one’s uncertainty, agnosticism carefully but in a more deliberative context perhaps of what one should certainly do so.

Lucas: Let’s just bring it a little bit back to the ethical epistemology behind and ethical motivation behind your hedonistic imperative given your objectivism. I guess here, it’d also be interesting to know if you could also explore key metaphysical uncertainties and physical uncertainties, and what more and how we might go about learning about the universe such that your view would be further informed.

David: Happy to launch into long spiel about my view. One thing I think it really is worth stressing is that one doesn’t need to buy into any form of utilitarianism or suffering-focused ethics to believe that we can and should phase out the biology of involuntary suffering. It’s common to all manner of secular and religious views that we should be other things being equal minimizing suffering reducing unnecessary suffering and this is one thing that technology, it could buy a technology allows us to do and support for something like universal access for implantation, genetic screening, phasing out factory farming and shutting slaughter houses, going on to essentially reprogram the biosphere.

It doesn’t involve a commitment to some particular one specific ethical or meta-ethical view. For something like pain-free surgery anesthesia, you don’t need to sign up for it to recognize it’s a good thing. I suppose my interest is very much in building bridges with other ethical traditions. Yeah, I am happy to go into some of my own personal views but I just don’t want to tie this idea that we can use bio-tech to get rid of suffering into anything quirky or idiosyncratic to me. I have a fair number of idiosyncratic views.

Lucas: It would be interesting if you’d explain whether or not you think that super intelligences or AGI will necessarily converge on what you view to be objective morality or if that is ultimately down to AI researchers to be very mindful of implementing.

David: I think there are real risk here when one starts speaking as though posthuman super intelligence is going to end up endorsing a version of one’s own views and values, which a priori ,if one thinks about, is extremely unlikely. I think too one needs to ask yeah, when I was talking about post human super intelligence, if post human super intelligence is biological descendants, I think post human super intelligence will have a recognizable descendant of pleasure-pain axis. I think it will be ratcheted up so that say experience below hedonic zero is impossible.

In that sense, I do see a convergence. By contrast, if one has a conception of post human super intelligence such that post human super intelligence may not be sentient, may not be experiential at all then, there is no guarantee that such a regime would be friendly to anything recognizably human in its values.

Lucas: The crux here there are different ways of doing value alignment and one such way is descriptively through a super intelligence being able to gain enough information about the set of all values that human beings have and say aligning to those or to some fraction of those or to some idealized version of those through something like a coherent extrapolated volition. Another one is where we embed a moral epistemology within the machine system, so that the machine becomes an ethical reasoner, almost a moral philosopher in its own right. It seems that given your objectivist ethics that with that moral epistemology, it would be able to converge on what is true. Do these different paths forward makes sense to you and/or it also seems that the role of mind melding seems to be very crucial and core to the realization of the correct ethics in your view?

David: With some people, their hearts sinks when the topic of machine consciousness crops up because they know it’s going to be a long inconclusive philosophical discussion and a shortage of any real empirical tests. Yeah, I will just state. I do not think a classical digital computer is capable of phenomenal binding, therefore it will not understand the nature of consciousness or pleasure and pain, and I see the emotion of value and disvalue is bound with the pleasure-pain axis. In that sense, I think what we’re calling machine artificial general intelligence, in one sense it’s invincibly ignorant. I know a lot of people would disagree with this description but if you think humans or at least some humans spend a lot of their time thinking about, talking about, exploring consciousness and it’s all varieties in some cases exploring psychedelia, what are we doing? There are vast range of cognitive domains that are completely, cognitively inaccessible to digital computers.

Lucas: Putting aside the issue of machine consciousness, it seems that being able to first-person access hedonic states provides a extremely foundational and core motivational or at least epistemological role in your ethics David.

David: Yes. I mean part of intelligence involves being able to distinguish the important from the trivial, which ultimately as far as I can see boils down to the pleasure-pain axis. Digital zombies have no conception of what is important or what is trivial I would say.

Lucas: Why would that be if a true zombie in the David Chalmers sense is functionally isomorphic to a human. Presumably that zombie would properly care about suffering because all of its functional behavior is the same. Do you think in the real world, digital computers can’t do the same functional computation that a human brain does?

David: None of us have the slightest idea how one would set about programming a computer to do the kinds of things that humans are doing when they talk about and discuss consciousness when they take psychedelics or discuss the nature of the self. I’m not saying work arounds are impossible. I just don’t think they’re spontaneously going to happen.

Brian: I agree. Just like building intelligence itself, it requires a lot of engineering to create those features of humanlike psychology.

Lucas: I don’t see why it would be physically or technically impossible to instantiate an emulation of that architecture or an architecture that’s basically identical to it in a machine system. I don’t understand why computer architecture, computer substrate is really so different from biological architecture or substrate such that it’s impossible for this case.

David: It’s whether one feels the force of the binding problem or not. The example one can give, imagine the population of the USA are skull bound minds, imagine them implementing any kind of computation you like. They are ultra fast, electromagnetic signaling far faster than the retro chemical signaling and the CNS is normally conceived. Nonetheless, short of a breakdown with monistic physicalism, there is simply no way that the population of the USA is spontaneously going to become subject to experience to apprehend perceptual objects. Essentially, all you have is a micro experiential zombie. The question is why are 86 billion odd membrane bound supposedly classical neurons any different?

Why aren’t we micro experiential zombies? One way to appreciate, i think, the force, the adaptive role of phenomenal binding is to look at syndromes where binding even harshly breaks down such as simultanagnosia where the subject can only see one thing at once. Or motion blindness or akinetopsia, where one can’t apprehend motion or severe forms of schizophrenia where there is no longer any unitary self. Somehow right now, you instantiate a unitary world simulation populated by multiple phenomenally bound dynamical objects and this is tremendously fitness enhancing.

The question is how can a bunch of membrane-bound nerve cells, a pack of neurons carry out what is classically impossible. I mean one can probe the CNS with temporary course grained and neuro scans… individual feature process, edge detectors, motion detectors, color detectors. Apparently, there are no perceptual objects there. How is it that right now that your mind/brain is capable of running this egocentric world simulation in almost real time. It’s astonishing computational feat. I argue for a version of quantum mind but one needn’t buy into this to recognize that it’s profound an unsolved problem. I mean why aren’t we like the population of the USA?

Lucas: Just to bring this back to the AI alignment problem and putting aside issues in phenomenal binding, and consciousness for a moment. Putting aside also the conception that super intelligence is likely to be some sort of biologic instantiation if we imagine the more AI safety mainstream approach, the MIRI idea of there being simply a machine super intelligence. It seems that in your view David and I think here this elucidates a lot of the interdependencies and difficulties where one’s meta-ethical views are intertwined in the end with what is true about consciousness and computation. It seems that in your view, close to or almost maybe perhaps impossible to actually do AI alignment or value alignment on machine super intelligence.

David: It is possible to do value alignment but I think the real worry is that if you take the MIRI scenario seriously, this recursively self-improving software that will somehow … This runaway intelligence. There’s no knowing where it may lead by MIRI as far as I know have very different conception of the nature of consciousness and value. I’m not aware that they tackle the binding problem. I just don’t see that unitary subjects of experience or values, or pleasure-pain axis are spontaneously going to emerge from software. It seems to involve some form of strong emergence.

Lucas: Right. I guess to tie this back and ground it a bit. It seems that the portion of your metaethics, which is going to be informed by empirical facts about consciousness and minds in general is the view in there that without access to the phenomenal pleasure-pain axis, what you view to have an intrinsic goodness or wrongness to it because it is foundationally and physically, and objectively the pleasure-pain axis of the universe, the heat and the spark in the equation I guess as you say. Without access to that, then ultimately, one will go awry in one’s ethics if one does not have access to phenomenal hedonic states given that that’s the core of value.

David: Yeah. In theory, an intelligent digital computer stroke robot could impartially pave the cosmos with either dolorium or hedonium without actually understanding the implications of what it was doing. Hedonium being or utilitronium, matter and energy optimized for pure bliss. Dolorium being matter and energy optimized for, lack of a better word, for pure misery or despair. That’s the system in question we do not understand the implications of what it was doing. That I know a lot of people do think that well, sooner or later, classical, digital computers, our machines are going to wake up. I don’t think it’s going to happen. Rather we’re not talking about hypothetical quantum computers next century and beyond. Simply an expansion of today’s programmable digital computers. I think they’re zombies and will remain zombies.

Lucas: Fully autonomous agents which are very free and super intelligent in relation to us will in your view require a fundamental access to that which is valuable, which is phenomenal states, which is the phenomenal pleasure-pain axis. Without that, it’s missing its key epistemological ingredient. It will fail in value alignment.

David: Yes, yeah, yeah. It just simply does not understand the nature of the world. It’s rather like claiming where the system is intelligent but doesn’t understand the second or of thermodynamics. It’s not a full spectrum super intelligence.

Lucas: I guess my open question there would be then, whether or not it would be possible to not have access to fundamental hedonic states but still be something of a Bodhisattva with a robust moral epistemology that was heading in the right direction or what might be objective.

David: The system in question would not understand the implications of what it was doing.

Lucas: Right. It wouldn’t understand the implications but if it got set off in that direction and it was simply achieving the goal, then I think in some cases we might call that value aligned.

David: Yes. One can imagine … Sorry Brian. Do intervene when you’re ready but yeah, one could imagine for example being skeptical of the possibility of interstellar travel for biological humans but programming systems to go out across the cosmos or at least within our cosmological horizons and convert matter and energy into pure bliss. I mean one needn’t assume that this will apply to our little bubble of civilization but watch if we do about inert matter and energy elsewhere in the galaxy. One can leave it as it is or if one is let’s say a classical utilitarian, one could convert it into pure bliss. Yeah, one can send out probes. One could restructure, reprogram matter and energy in that way.

That would be a kind of compromise solution in one sense. Keep complexity within our little tiny bubble of civilization but convert the rest of the accessible cosmos into pure bliss. Though that technically would not strictly speaking maximize the abundance of positive value in our hubble volume, nonetheless it could become extraordinarily close to it from a classical utilitarian perspective.

Lucas: Brian, do you have anything to add here?

Brian: While I disagree on many, many points, I think digital computation is capable of functionally similar enough processing as the brain does. Even that weren’t the case, a paperclip maximizer with a very different architecture would still have a very sophisticated model of human emotions and its motivations wouldn’t be hooked up to those emotions but it would understand for all other sense of the word understand human pleasure and pain. Yeah, I see it more as a challenge of hooking up the motivation properly. As far as my thoughts on alignment in general based on my metaethics, I tend to agree with the default approach like the MIRI approach, which is unsurprising because MIRI is also anti-realist on metaethics. That approach sees the task as taking human values and somehow translating them into the AI and so that could be in a  variety of different ways learning human values implicitly from certain examples or with some combination of maybe top down programming of certain ethical axioms.

That could send to exactly how you do alignment and there are lots of approaches to that. The basic idea that you need to specifically replicate the complexity of human values in machines and the complexity of the way humans reason. It won’t be there by default in any way shared between my opinion and that of the mainstream AI alignment approach.

Lucas: Do you take a view then similar to that of coherent extrapolated volition?

Brian: In case anybody doesn’t know, coherent extrapolated volition is Eliezer Yudkowsky’s idea of giving the AI the meta … You could call it a metaethics. It’s a meta rule for learning values to take humanity and think about what humanity want to want if it was smarter, knew, had more positive interactions with each other and thought faster and then, try to identify points of convergence among the values of different idealized humans. In terms of theoretical things to aim for, I think CEV is one reasonable target for reasons of cooperation among other humans. I mean if I controlled the world, I would prefer to have the AI implement my own values rather than humanities values because I care more about my values. Some human values are truly abhorrent to me and others seem at least unimportant to me.

In terms of getting everybody together to not fight endlessly over the outcome of AI in this theoretical scenario, CEV would be a reasonable target to strive for. In practice, I think that’s unrealistic like a pure CEV is unrealistic because the world does not listen to moral philosophers to any significant degree. In practice, things are determined by politics, economic power, technological and military power, and forces like that. Those determine most of what happens in the world. I think we may see approximations to CEV that are much more crude like you could say that democracy is an approximation to CEV in the sense that different people with different values, at least in theory, discuss their differences and then, come up with a compromise outcome.

Something like democracy maybe power-weighted democracy in which more powerful actors have more influence will be what ends up happening. The philosophers dream of idealizing values to perfection is unfortunately not going to happen. We can push in directions that are slightly more reflective. We can push aside towards slightly more reflection towards slightly more cooperation and things like that.

David: Couple of points that first, what to use an example we touched on before. What would be coherent extrapolated volition for all the world’s football supporters? Essentially, there’s simply no way to reconcile all their preferences. One may say that if they were fully informed football supporters, wouldn’t waste their time passionately supporting one team or another but essentially I’m not sure that the notion of coherent extrapolated volition there would make sense. Of course, there are more serious issues in football but the second thing when it comes to the nature of value, regardless of one’s metaphysical stance on whether one’s a realist or an anti-realist about value. I think it is possible by biotechnology to create states that are empirically, subjectively far more valuable than anything physiologically feasible today.

Take Prince Myshkin in Dostoevsky’s The Idiot. Like Dostoevsky was a temporal lobe epileptic and he said, “I would give my whole life for this one instant.” Essentially, there are states of consciousness that are empirically super valuable and rather than attempting to reconcile irreconcilable preferences, I think you could say that we should be and so far as we aspire to long term full spectrum super intelligence, perhaps we should be aiming to create these super valuable states. I’m not sure whether it’s really morally obligatory. I said my own focus is on the overriding importance of phasing out suffering but for someone who does give some weight or equal weight to positive experiences positively valuable experiences, that there is a vast range of valuable experience that is completely inaccessible to humans that could be engineered via biotechnology.

Lucas: A core difference here is going to be that given Brian’s view of anti-realism, AI alignment or value alignment would in the end be left to those powers which he described in order to resolve irreconcilable preferences. That is if human preferences don’t converge strongly enough after enough time and information that there are no longer irreconcilable preferences, which I guess I would suppose is probably wrong.

Brian: Which is wrong?

Lucas: That it would be wrong that human beings preferences would converge strongly enough that there would no longer be irreconcilable preferences after coherent extrapolated volition.

Brian: Okay, I agree.

Lucas: I’m saying that in the end, value alignment would be left up to economic forces, military forces, other forces to determine what comes out of value alignment. In David’s view, it would simply be down to if we could get the epistemology right and we could know enough about value and the pleasure-pain axis and the metaphysical status of phenomenal states that that would be value alignment would be to capitalize on that. I didn’t mean to interrupt you Brian. You want to jump in there?

Brian: I was going to say the same thing you did that I agree with David that there would be irreconcilable differences and in fact, many different parameters of the CEV algorithm would probably affect the outcome. One example that you could give is that people tend to crystallize their moral values as they age. You could imagine somebody who was presented with utilitarianism as a young person would be more inclined toward that whereas, maybe if that person haad been presented with deontology as a young person would the person would prefer  deontology as he got older and so depending on seemingly arbitrary factors such as the order in which you are presented with moral views or what else is going on in your life at the time that you confront a given moral view or 100 other inputs. The output could be sensitive to that. CEV is really a class of algorithms depending on how you tune the parameters. You could get substantially different outcomes.

Yeah, CEV is an improvement even if there’s no obvious unique target. As I said, in practice, we won’t even get pure CEV but we’ll get some kind of very rough power-weighted approximation similar to our present world of democracy and competition among various interest groups for control.

Lucas: Just to explain how I’m feeling so far. I mean Brian, I’m very sympathetic to your view but I’m also very sympathetic to David’s view. I hover somewhere in between. I like this point that David made where he quoted Russell, something along the lines that one ought to be careful when discussing ethical metaphysics such that one is not simply trying to make one’s own views and preferences objective.

David: Yeah. When one is talking about well, just in general, when one speaks about the nature for example post human super intelligence, think of the way today that the very nature and notion of intelligence is a contested term. Simply sticking the words super in front of it is just how illuminating is it. When I read someone’s account of super intelligence, I’m really reading an account of what kind of person they are, their intellect and their values. I’m sure when I discuss the nature of full spectrum super intelligence, at least now I can see what I can’t the extent to which I’m simply articulating my own limitations.

Lucas: I guess for me here to get all my partialities out of the way, I hope that objectivism is true because I think that it makes the value alignment way less messy. In the end, we could have something actually good and beautiful, which I don’t know is some preference that I have that might be objective or not just simply wrong, or confused. The descriptive picture that I think Brian is committed to, which gives rise to the MIRI and Tomasik form of anti-realism is just one where in the beginning, there was entropy and noise and many generations of stars fusing atoms into heavier elements. One day one of these disks turn into a planet and a sun shone some light on a planet, and the planet began to produce people. There’s an optimization process there in the end, which simply seems to be ultimately driven by entropy and morality seems to simply a part of this optimization process, which just works to facilitate and mediate the relations between angry mean primates like ourselves.

Brian: I would point out there’s also a lot of spandrel to morality in my opinion, specially these days not that we’re not heavily optimized by biological pressures. All these conversation that we’re having right now is a spandrel in the sense that it’s just an outgrowth of certain abilities that we evolve but it’s not at all adaptive in any direct sense.

Lucas: Right. In this view, it really just seems like morality and suffering, and all of this is just byproduct of the screaming entropy and noise of whatever led to this universe. At the same time, the objective process and I think this is the part the people who are committed to MIRI anti-realism and I guess just relativism and skepticism about ethics in general, maybe are not tapping into enough. At the same time, this objectivity is producing a very real and objective phenomenal self and story, which is caught up in suffering where suffering is really suffering and really sucks to suffer. It all seems at face value true in that moment throughout the suffering that this is real. The suffering is real. The suffering is bad. It’s pretty horrible.

This bliss is something that I would never give up or if the rest of the universe were this bliss, that would just be the most amazing thing ever. In this very subjective phenomenal, I like just experiential thing that the universe produces, the subjective phenomenal story and narrative that we live. It seems there’s just this huge tension between that and I think the anti-realism, the clear suffering of suffering and just being a human being.

Brian: I’m not sure if there’s a tension because the anti-realist agrees that humans experience suffering as meaningful and they experience it as the most important thing imaginable. There’s not really a tension and you can explore why humans quest for objectivity. There seems to be certain glow that attaches to things by saying that they’re objectivity moral. That’s just a weird quirk of human brains. I would say that ultimately, we can choose to care about what we care about whether it’s subjective or not. I often say even if objective truth exist, I don’t necessarily care what it says because I care about what I care about. It could turn out that objective truth orders you to torture squirrels. If it does, then, I’m not going to follow the objective truth. On reflection, I’m not unsatisfied at all with anti-realism because what more could you want than what you want.

Lucas: David, feel free to jump in if you’d like.

David: Well, there it’s just … there’s this temptation to oscillate between two senses of the words subjective. Subjective in neither truth nor false, and subject in the sense of first-person experience. My being in agony or you’re being in agony or someone being in despair is as I said as much an objective property of reality as the rest mass of the electron. I mean what we can be doing is working in such ways as to increase the theory to maximize the amount of subjective value in the world regardless of whether or not one believes that this has any transcendent significance with the proviso here that there is a risk that if one aims strictly speaking to maximize subjective value, that one gets the utilitronium shockwave. If one is as I said, what I personally advocate as aiming for a civilization of super intelligent bliss one is not asking people to give up their core values and preferences unless one of those core values and preferences is to keep hedonic set points unchanged. That’s not very intellectually satisfying but it’s … this idea if one is working towards some kind of census, compromise.

Lucas: I think now I want to get into a bit more just about ethical uncertainty and specifically with regards to meta-ethical uncertainty. I think that just given the kinds of people that we are, that even if we disagree about realism versus anti-realism or ascribe different probabilities to each view. We might pretty strongly converge on how we ought to do value alignment given our kinds of moral considerations that we have. I’m just curious to explore a little bit more about what you guys are most uncertain about what it would take to change your mind? What new information you would be looking for that might challenge or make you revise your metaethical view? How we might want to proceed with AI alignment given our metaethical uncertainty?

Brian: Can you do those one by one?

Lucas: Yeah, for sure. If I can remember everything I just said. First to start off, what do you guys most uncertain about within your meta-ethical theories?

Brian: I’m not very uncertain meta-ethically. I can’t actually think of what would convince me to change my metaethics because as I said, even if it turned out that metaphysically moral truth was a thing out there in some way whatever that would mean, I wouldn’t care about it except for like instrumental reasons. For example, if it was a god, then you’d have to instrumentally care about god punishing you or something but in terms of what I actually care about, it would be not connected to moral truth. Yeah, I would have to be some sort of revision of the way I conceive of my own values. I’m not sure what that would look like to be meta-ethically uncertain.

Lucas: There’s a branch of metaethics, which has to tackle this issue of meta-ethical commitment or moral commitment to meta-ethical views. If some sort of meta-ethical thing is true, why ought I to follow what is metaethically true? In your view Brian, it is just simply why ought you not to follow or why ought it not matter for you to follow what is meta-ethically true if there ends up being objective moral facts.

Brian: The squirrel example is a good illustration if ethics turned out to be, you must torture as many squirrels as possible. Then, screw moral truth. I don’t see what this abstract metaphysical thing has to do with what I care about myself. Basically, my ethics comes from empathy, seeing others in pain, wanting that to stop. Unless moral truths somehow gives insight about that, like maybe moral truths is somehow based on that kind of empathy, sophisticated way then, it would be another person giving me thoughts on morality. The metaphysical nature of it would be irrelevant. It would only be useful in so far as it would appeal to my own emotions and sense of what morality should be for me.

David: If I might interject. Undercutting my position and negative utilitarianism and suffering-focus ethics, I think it quietly likely that posthuman super intelligence, advance civilization with a hedonic range ratcheted right up to 70 to 100 or something like that. We’d look back on anyone articulating the kind of view that I am, that anyone who believes in suffering-focused ethics does and seeing it as some kind of depressive psychosis while intuitively assumes that our successes will be wiser than we are and perhaps, well they will be in many ways. Yet in another sense, I think we should be aspiring to ignorance that once we have done absolutely everything in our power to minimize mitigate, abolish and prevent suffering, I think we should forget it even existed. I hope that eventually any experience below hedonic zero will be literally inconceivable.

Lucas: Just to jump to you here David. What are your views about what you are most meta-ethically uncertain about?

David: It’s this worry that what one is doing however much one is pronouncing about the nature of reality, or the future of intelligence life in the universe and so on. What one is really doing is some kind of disguised autobiography. Given that quite a number of people sadly pain and suffering have loomed larger in my life than pleasure, turning this into deep metaphysical truth about the universe. This potentially undercuts my view. As I said, I think there are arguments against the symmetry view that suffering is self-intimatingly bad where there is nothing self-intimatingly bad about being  insentient system or a system that it’s really content. Nonetheless, yeah, I take seriously the possibility that’s all I’m doing is expressing obliquely by own limitations of perspective.

Lucas: Given these uncertainties and the difficulty and expected impact of AI alignment, if we’re again committing ourselves to this MIRI view of an intelligence explosion with quickly recursive self-improving AI systems, how would you both, if you were the king of AI strategy, how would you go about allocating your metaethics and how would you go about working on the AI alignment problem and thinking about the strategy given your uncertainties and your views?

Brian: I should mention that my most probable scenario for AI is a slow take off in which lots of components of intelligence emerge piece by piece rather than a localized intelligence explosion. As far as the intelligence like if it were a hard take off localized intelligence explosion, then, yeah I think the diversity approaches that people are considering is what I would do as well. It seems to me, you have to somehow learn values because in the same way that we’ve discovered that teaching machines by learning is more powerful than teaching them by hard coding rules. You probably have to mostly learn values as well. Although, there might be hard coding mixed in. Yeah, I would just pursue a variety of approaches and the way that the current community is doing.

I support the fact that there is also a diversity of short term versus long term focus. Some people are working on concrete problems. Others are focusing on issues like decision theory and logical uncertainty and so on because I think some of those foundational issues will be very important. For example, decision theory could make a huge difference to the AI’s effectiveness as well as issues of what happens in conflict situations. Yeah, I think a diversity of approaches is valuable. I don’t have a specific advice on when I would recommend tweaking current approaches. I guess I expected that the concrete problems work will mostly be done automatically by industry because those are the kinds of problems that you need to make AI work at all. If anything, I might invest more in the kind of long-term approaches that practical applications are likely to ignore or at least put off until later.

David: Yes, because of my background assumptions are different, it’s hard for me to deal with your question. If one believes that subjects of experience that could suffer could simply emerge at different levels of abstraction, I don’t really know how to tackle this because this strikes me as a form of strong emergence. One of the reasons why philosophers don’t like strong emergence is that essentially, all bets are off. Yeah, you imagine if life hadn’t been reducible to molecular biology and hence, ultimately to content chemistry and physics. Yeah, I’m not probably the best person to answer your question.

I think in terms of real moral focus, I would like to see essentially the molecular signature of unpleasant experience identified and essentially, you’re just making it completely off limits and biologically impossible for any sentient being to suffer. If one also believes that there are or could be subjects of experience that somehow emerge in classical digital computers, then, yeah, I’m floundering my theory of mind and reality would be wrong.

Lucas: I think touching on the paper that Kaj Sotala had written on suffering risks, I think that a lot of different value systems would also converge with you on your view David. Whether or not we take the view of realism or anti-realism, I think that most people would agree with you. I think the issue comes about with again, preference conflicts where some people I think even this might be a widespread view in catholicism where you view suffering as really important because it teaches you things and/or it has some special metaphysical significance with relation to god. Within the anti-realism view, with Brian’s view, I would find it very… just dealing with varying preferences on whether or not we should be able to suffer is something I just don’t want to deal with.

Brian: Yeah, that illustrates what I was saying about I prefer my values over the collective values of humanity. That’s one example.

David: I don’t think it would be disputed that sometimes suffering can teach lessons. The question is are there any lessons that couldn’t be functionally replaced by something else. This idea that we can just offload the nasty side of life on to software. In the case of pain, nociception one knows that yeah, so they brought software systems can be program or trained up to avoid noxious stimuli without the nasty raw feels should we be doing the same thing for organic biological robots too. When it comes to this, the question of suffering, one can have quite fierce and lively disputes with someone who says that yeah, they want to retain the capacity to suffer. This is very different from involuntary suffering. I think that quite often someone can see that no, they wouldn’t want to force another sentient being to suffer against their will. It should be a matter of choice.

Lucas: To tie this all into AI alignment again, really the point of this conversation is that again, we’re doing ethics on a deadline. If you survey the top 100 AI safety researchers or AI researches in the world, you’ll see that they give a probability distribution of the likelihood of human level artificial intelligence with about a 50% probability at 2050. This, many suspect, will have enormous implications for earth originating-intelligent life and our cosmic endowment. Our normative and descriptive and applied ethical practices that we engage with are all embodiments and consequential to the sorts of meta-ethical views, which we hold, which may not even be explicit. I think many people don’t really think about metaethics very much. I think that many AI researchers probably don’t think about metaethics very much.

The end towards which AI will be aimed will largely be a consequence of some aggregate of meta-ethical views and assumptions or the meta-ethical views and assumptions of a select few. I guess Brian and David, just to tie this all together, what do you guys view as really the practicality of metaethics in general and in terms of technology and AI alignment.

Brian: As far as what you said about metaethics determining the outcome, I would say maybe the implicit metaethics will determine the outcome but I think as we discuss before, 90 some percent of the outcome will be determined by ordinary economic and political forces. Most people in politics in general don’t think about metaethics explicitly but they still engage in the process and have a big impact on the outcome. I think the same will be true in AI alignment. People will push for things they want to push for and that’ll mostly determine what happens. It’s possible that metaethics could inspire people to be more cooperative depending on how it’s framed. CEV as a practical metaethics could potentially inspire cooperation if it’s seen as an ideal to work towards, although the extent to which it can actually be achieve is questionable.

Sometimes, you might have a naïve view where a moral realist assumes that a super intelligent AI would necessarily converge to the moral truth or at least a super intelligent AI could identify the moral truth and then, maybe all you need to do is program the AI to care about the moral truth once it discovers it. Those particular naïve approaches, I think would produce the wrong outcomes because there would be no moral truth to be found. I think it’s important to be wary of that assumption that a super intelligence will figure it out on its own and we don’t need to do the hard work of loading complex human values ourselves. It seems like the current AI alignment community largely recognizes that they recognize that there’s a lot of hard work in loading values and it won’t just happen automatically.

David: In terms of metaethics, consider the nature of pain-free surgery, surgical anesthesia. When it was first introduced in the mid 19th century, it was for about 15 years controversial. There were powerful voices who spoke against it but nonetheless, very rapidly a consensus emerge and we all now, almost all take it for granted for major surgery anesthesia. It didn’t require a consensus on the nature of value and metaethics. It’s just this is the obvious given our nature. Clearly, I would hope that eventually something similar will happen not just for physical pain but also psychological pain too. Just as we now take it for granted that it was the right thing to do to eradicate smallpox, no one is seriously suggesting that we bring smallpox back and it doesn’t depend on consensus on metaethics.

I would hope that the experience below hedonic zero, which one can possibly we’ll be able to find its precise molecular signature. I hope that consensus will emerge that we should phase it out too. Sorry, this isn’t much in the way of practical guidance to today’s roboticist and AI researchers but I suppose I’m just expressing my hope here.

Lucas: No, I think I share that. I think that we have to do ethics on a deadline but I think that there are certain ethical things whose deadline is much longer or which doesn’t necessarily have a real concrete deadline. I like… with your example of the pain anesthesia drugs.

Brian: In my view, metaethics is mostly useful for people like us or other philosophers and effective altruists who can inform our own advocacy. We want to figure out what we care about and then, we go for it and push for that. Then, maybe to some extent, it may diffuse through society in certain ways but in the start, it’s just helping us figure out what we want to push for.

Lucas: There’s an extent to which the evolution of human civilization has also been an evolution of metaethical views, which are consciously or unconsciously being developed. Brian, your view is simply that 90% of what has causal efficacy over what happens in the end are going to be like military and economic and just like raw optimization forces that work on this planet.

Brian: Also, politics and memetic spandrels. For example, like people talk about the rise of postmodernism as replacement of metaethical realism with anti-realism in popular culture. I think that is a real development. One can question to what extent, it matters. Maybe it’s correlated with things like a decline in religiosity which matters more. I think that is one good example of how metaethics can actually go popular and mainstream.

Lucas: Right. Just to bring this back, I think that in terms of the AI alignment problem, I think I try to or at least I’d like to be a bit more optimistic about how much causal efficacy each part of thinking has causal efficacy over the AI alignment problem. I like to not or I tend not to think that 90% of it will in the end be due to rogue impersonal forces like you’re discussing. I think that everyone no matter who you are stands to gain from more metaethical thinking in so far as that whether you take realist or anti-realist views. The expression of your values or whatever you think your values might be whether they might be conventional or relative, or arbitrary in your view, or whether they might relate to some objectivity. They’re much likely less to be expressed and I think a reasonable in a good way, without sufficient metaethical thinking and discussion.

David: One thing I would very much hope that before for example, radiating out across the cosmos, we would sort out our problems on earth in the solar system first regardless of whether one is secular or religious, or a classical or a negative utilitarian, let’s not start thinking about colonizing nearby solar systems or anything there. Yeah, if one is an optimist or maybe thinking of opportunities forgone but at least wait a few centuries. I think in a fundamental sense, we do not understand the nature of reality and not understanding the nature of reality comes with not understanding the nature of value and disvalue or the experience of value and disvalue as Brian might put it.

Brian: Unfortunately, I’m more pessimistic than David. I think the forces of expansion will be hard to stop as they always have been historically. Nuclear weapons are something that almost everybody wishes hadn’t been developed and yet they were developed. Climate change is something that people would like to stop but it has a force of its own due to the difficulty of coordination. I think the same will be true for space colonization and AI development as well that we can make tweaks around the edges but the large trajectory will be determined by the runaway economic and technological situation that we find ourselves in.

David: I fear Brian maybe right. I used to sometimes think about the possibilities of so-called cosmic rescue missions if the rare earth hypothesis is false and suffering Darwinian life exists within our cosmological horizon. I used to imagine this idea that we would radiate out and prevent suffering elsewhere. A, I suspect the rare earth hypothesis is true but B, I suspect even if for suffering life forms do exist elsewhere within our hubble volume. It’s probably more likely humans or our successes would go out and just create more suffering or it’s a rather dark and pessimistic view in my more optimistic moments I think we will phase out suffering all together in the next few centuries but these are guesses really.

Lucas: We’re dealing with ultimately given AI and it being the most powerful optimization process or the seed optimization process to radiate out from earth. I mean we’re dealing with potential astronomical waste or astronomical value, or astronomical disvalue and if we tie this again into moral uncertainty and start thinking about William MacAskill’s work on moral uncertainty where we just do what might be like expected value calculations with regards to our moral uncertainty. We’ve tried to be very mathematical about it and consider the amount of matter and energy that we are dealing with here. Given a super intelligent optimization process coming from Earth.

I think that tying this all together and considering it all should potentially plan an important role in our AI strategy. I definitely feel very sympathetic to Brian’s views that in the end, it might all simply come down to these impersonal economic and political, and militaristic, and memetic forces which exist. Given moral uncertainty, given meta-ethical uncertainty and given the amount of matter and energy that is at stake, potentially some portion of AI strategy should play into circumventing those forces or trying to get around them or decrease them and their effects and hold on AI alignment.

Brian: Yeah. I think it’s tweaks around the edges as I said unless these approaches become very mainstream but I think the prior probability that AI alignment of the type that you would hope for becomes worldwide is low because the prior probability that any given thing becomes worldwide mainstream is low. You can certainly influence local communities who share those ideals and they can try to influence things to the extent possible.

Lucas: Right. I mean maybe something potentially more sinister is that it doesn’t need to become worldwide if there’s a singleton scenario or if the power and control over the AI is very small within a tiny organization or some smaller organization which has power in autonomy to do this kind of thing.

Brian: Yeah, I guess I would again say the probability that you will influence those people would be low. Personally, I would imagine it would be either within a government or a large corporation. Maybe we have disproportionate impact on AI developers relative to the average human. Especially as AI becomes more powerful, I would expect more and more actors to try to have an influence. Our proportional influence would decline.

Lucas: Well, I’m feel very pessimistic after all this. Morality is not real and everything’s probably going to shit because economics and politics is going to drive it all in the end, huh?

David: It’s also possible that we’re heading for a glorious future of super human bliss beyond the bounds of every day experience and that this is just the fag end of Darwinian life.

Lucas: All right. David, we’ll be having I think as you say one day we might have thoughts as beautiful as sunsets.

David: What a beautiful note to end on.

Lucas: I hope that one day we have thoughts as beautiful as sunsets and that suffering is a thing of the past whether that be objective or subjective within the context of an empty cold universe of just entropy. Great. Well, thank you so much Brian and David. Do you guys have any more questions or anything you’d like to say or any plugs, last minute things?

Brian: Yeah, I’m interested in promoting research on how you should tweak AI trajectories if you are foremost concerned about suffering. A lot of this work is being done by the Foundational Research Institute, which aims to avert s-risks especially as they are related to AI. I would encourage people interested in futurism to think about suffering scenarios in addition to extinction scenarios. Also, people who are interested in suffering-focused ethics to become more interested in futurism and thinking about how they can affect long-term trajectories.

David: Visit my websites urging the use of biotechnology to phase out suffering in favor of gradients of intelligent bliss for all sentient beings. I’d also like just to say yeah, thank you Lucas for this podcast and all the work that you’re doing.

Brian: Yeah, thanks for having us on.

Lucas: Yeah, thank you. Two Bodhisattvas if I’ve ever met them.

David: If only.

Lucas: Thanks so much guys.

If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

Podcast: Six Experts Explain the Killer Robots Debate

Why are so many AI researchers so worried about lethal autonomous weapons? What makes autonomous weapons so much worse than any other weapons we have today? And why is it so hard for countries to come to a consensus about autonomous weapons? Not surprisingly, the short answer is: it’s complicated.

In this month’s podcast, Ariel spoke with experts from a variety of perspectives on the current status of lethal autonomous weapons systems (LAWS), where we are headed, and the feasibility of banning these weapons. Guests include ex-Pentagon advisor Paul Scharre (3:40), artificial intelligence professor Toby Walsh (40:51), Article 36 founder Richard Moyes (53:30), Campaign to Stop Killer Robots founder Mary Wareham and Bonnie Docherty of Human Rights Watch (1:03:38), and ethicist and co-founder of the International Committee for Robot Arms Control, Peter Asaro (1:32:39).

Topics discussed in this episode include:

  • the history of semi-autonomous weaponry in World War II and the Cold War (including the Tomahawk Anti-Ship Missile)
  • how major military powers like China, Russia, and the US are imbuing AI in weapons today
  • why it’s so difficult to define LAWS and draw a line in the sand
  • the relationship between LAWS proliferation and war crimes
  • FLI’s recent pledge, where over 200 organizations and over 2800 individuals pledged not to assist in developing or using LAWS
  • comparing LAWS to blinding lasers and chemical weapons
  • why there is hope for the UN to address this issue

Publications discussed in this episode include:

You can listen to the podcast above, and read the full transcript below. You can check out previous podcasts on SoundCloud, iTunes, GooglePlay, and Stitcher.

If you work with artificial intelligence in any way, and if you believe that the final decision to take a life should remain a human responsibility rather than falling to a machine, then please consider signing this pledge, either as an individual or on behalf of your organization.

Ariel: Hello. I’m Ariel Conn with the Future of Life Institute. As you may have seen, this month we announced a pledge against lethal autonomous weapons. The pledge calls upon governments and government leaders to create a future with strong international norms, regulations and laws against lethal autonomous weapons. But in the meantime signatories agree that they they will neither participate in nor support the development, manufacture, trade, or use of lethal autonomous weapons. At the time of this recording, over 220 AI-related organizations and over 2800 individuals have signed. Signatories include Google DeepMind and its founders, University College London, the XPRIZE Foundation, Clearpath Robotics, Silicon Valley Robotics, the European Association for Artificial Intelligence — and many other AI societies and organizations from around the world. Additionally, people who signed include Elon Musk, Google’s head of research and machine learning Jeff Dean, many other prominent AI researchers, such as Stuart Russell, Toby Walsh, Meredith Whitaker, Anca Dragan, Yoshua Bengio, and even politicians, like British MP Alex Sobel.

But why? We’ve all seen the movies and read the books about AI gone wrong, and yet most of the signatories agree that the last thing they’re worried about is malicious AI. No one thinks the Terminator is in our future. So why are so many people in the world of AI so worried about lethal autonomous weapons? What makes autonomous weapons so much worse than any other weapons we have today? And why is it so hard for countries to come to a consensus about autonomous weapons? Not surprisingly, the short answer is: it’s complicated. For the longer answer, we have this podcast.

For this podcast, I spoke with six of the leading experts in autonomous weapons. You’ll hear from defense expert Paul Scharre, who recently released the book Army of None: Autonomous Weapons and the Future of War. We discuss the history of autonomous and semi-autonomous weaponry, which dates back to WWII, as well as some of the more nuanced issues today that often come up for debate. AI researcher Toby Walsh looks at lethal autonomous weapons from a more technical perspective, considering the impact of autonomous weapons on society, and also the negative effects they could have for AI researchers if AI technology is used to kill people. Richard Moyes, with Article 36, coined the phrase meaningful human control, which is what much of the lethal autonomous weapons debate at the United Nations now focuses on. He describes what that means and why it’s important. Mary Wareham and Bonnie Docherty joined from Human Rights Watch, and they’re also with the Campaign to Stop Killer Robots. They talk about the humanitarian impact of lethal autonomous weapons and they explain the process going on at the United Nations today as efforts move toward a ban. Finally, my interviews end with Peter Asaro with the International Committee for Robot Arms Control and also the Campaign to Stop Killer Robots. Peter considers the issue of lethal autonomous weapons from an ethical and legal standpoint, looking at the impact killer robots could have on everything from human dignity to war crimes.

But I’ll let each of them introduce themselves better when their interviews begin. And because this podcast is so long, in the description, we’ve included the times that each interview starts, so that you can more easily jump around or listen to sections as you have time.

One quick, final point to mention is that everyone was kind enough to join at the last minute, which means not all of the audio is perfect. Most of it is fine, but please bear with us if you can hear people chattering in the background or any other similar imperfections.

And now for the first interview with Paul Scharre.

Paul: I’m Paul Scharre. I’m a senior fellow and director of the Technology and National Security Program at the Center for a New American Security. We’re a Washington, D.C.-based national security think tank that’s an independent bipartisan research organization.

Ariel: You have a background in weaponry. You were in the military, correct?

Paul: Yeah. I served about five and a half years in the US Army as a Ranger and a civil affairs team leader. I did multiple tours to Iraq and Afghanistan, and then I worked for several years after that in the Pentagon in the Office of the Secretary of Defense, where I actually worked on policy issues for emerging weapons technologies, including autonomous weapons.

Ariel: Okay. One of the very first questions that I want to start with is, how do you define an autonomous weapon?

Paul: That’s sort of the million-dollar question in a lot of ways. I don’t want to imply that all of the debate around autonomous weapons is a misunderstanding of semantics. That’s not true at all. There are clearly people who have very different views on what to do about the technology, but it is a big complicating factor because I have certainly seen, especially at the United Nations, very heated disagreements where it’s clear that people are just talking past each other in terms of what they’re envisioning.

When you say the term “autonomous weapon,” it conjures all sorts of different ideas in people’s minds, some people envisioning super advanced intelligent machines that have human-like or superhuman intelligence, something like a Terminator or Cylon from science fiction. The other people are envisioning something that might be very simple and doable today, like a Roomba with a gun on it.

Both of those things are probably really bad ideas but for very different kinds of reasons. And I think that that’s a complicating factor. So one of the dimensions of autonomy that people tend to get fixated on is how smart the weapon system is. I actually don’t think that that’s a useful way to define an autonomous weapon. Sometimes I’ll hear people say things like, “Well, this is not an autonomous weapon. This is an automated weapon because of the level of sophistication.” I don’t think that’s very helpful.

I think it’s much better, actually, to focus on the functions that the weapon is performing on its own. This is similar to the approach that the International Committee of the Red Cross has, which focuses on critical functions in weapons systems. The way that I define it in my book is I basically define an autonomous weapon as one that can complete an entire engagement cycle on its own. That is to say, it has all of the functionality needed to search for targets, to identify them, to make a decision about whether or not to attack them, and then to start the engagement and carry through the engagement all by itself.

So there’s no human in this loop, this cognitive loop, of sensing and deciding and acting out on the battlefield all by itself. That defines it in such a way that there are some things — and this is where it gets into some of the tricky definitional issues — there are weapons that have been around since World War II that I would call semi-autonomous weapons that have some degree of autonomy, that have some sensors on board. They can detect the enemy, and they can make some rudimentary kinds of actions, like maneuvering towards the enemy.

Militaries generally call these “homing munitions.” They’re torpedoes or air-to-air missiles or surface-to-air, air-to-ground missiles. They have sensors on them that might use sonar or radar or acoustic signatures. They can sense that the enemy is there, and then they use those sensors to maneuver towards the enemy to strike the target. These are generally launched by people at targets where the human knows there’s a target there.

These were originally invented in World War II by the Germans to hit Allied ships in the submarine wars in the Atlantic in World War II. You can imagine there’s a technical challenge trying to hit a moving target of a ship that’s moving. In a submarine, you’re trying to fire a torpedo at it and you might miss. So the first versions of these had microphones that could listen to the sound of the propellers from Allied ships and then steer towards where the sound was greatest so they could hit the ship.

In those cases — and this is still the case in the ones that are used today — humans see the target or have some indication of the target, maybe from a radar or sonar signature. And humans say, “There’s something out there. I want to launch this weapon to go attack it.” Those have been around for 70 years or so. I bring them up because there are some people who sometimes say, “Well, look. These autonomous weapons already exist. This is all a bunch of hullaballoo about nothing.”

I don’t think that’s really true. I think that a lot of the weapons systems that you see concern about going forward, would be things that will be quite qualitatively different, things that are going out over a wide area and searching for targets on their own, where humans don’t necessarily know where the enemy is. They might have some suspicion that the enemy might be in this area at this point in time, but they don’t know, and they launch the weapon to then find the enemy. And then, without radioing back to a human for approval, that weapon is delegated the authority to attack on its own.

By and large, we don’t see weapons like this in existence today. There are some exceptions. The Israeli Harpy drone or loitering munition is an exception. There were a couple experimental US systems in the ’80s and ’90s that are no longer in service. But this isn’t something that is in widespread use. So I do think that the debate about where we’re going in the future is at least a very valid one, and we are on the cusp of, potentially, things that will be quite different than anything we’ve seen before in warfare.

Ariel: I want to ask a quick question about the Harpy and any other type of weapon similar to that. Have those actually been used to kill anyone yet, to actually identify a target and kill some enemy? Or are they still just being used for identifying and potentially targeting people, but it’s still a human who is making the final decision?

Paul: That’s a great question. To the best of my knowledge, the Israeli Harpy has not been used in its fully autonomous mode in combat. So a couple things about how the Harpy functions. First of all, it doesn’t target people per se; it targets radars. Now, having said that, if a person is standing next to a radar that it targets, you’re probably going to be killed. But it’s not looking for individual persons. It’s looking for radar signatures and then zeroing in on them.

I mention that as important for two reasons. One, sometimes in some of the concerns that people raise about autonomous weapons, it can sometimes be unclear, at least to a listener, whether they are concerned about specifically weapons that would target humans or any weapon that might target anything on the battlefield. So that’s one consideration.

But, also, from sort of a practicality standpoint, it is easier to identify radar signatures more accurately than people who, of course, in many modern conflicts are not wearing uniforms or insignia or the things that might clearly identify them as a combatant. So a lot of the issues around distinction and accurately discriminating between combatants and noncombatants are harder for weapons that would target people.

But the answer to the question is a little bit tricky because there was an incident a couple years ago where a second-generation version of the Harpy called the Harop, or Harpy II, was used in the Nagorno-Karabakh region in the conflict there between Azerbaijan and Armenia. I think it was used by Azerbaijan and used to attack what looked like — I believe it was a bus full of fighters.

Now, by all accounts, the incident was one of actual militants being targeted — combatants — not civilians. But here was a case where it was clearly not a radar. It was a bus that would not have been emitting radar signatures. Based on my understanding of how the technology works, the Harop, the Harpy II, has a human-in-the-loop mode. The first-generation Harpy, as far as I understand, is all autonomous. The second-generation version definitely has a human-in-the-loop mode. It looks like it’s not clear whether it also has an autonomous version.

In writing the book, I reached out to the manufacturer for more details on this, and they were not particularly forthcoming. But in that instance, it looks like it was probably directed by a human, that attack, because as far as we know, the weapon does not have the ability to autonomously target something like a bus.

Ariel: Okay.

Paul: That’s a really long-winded answer. This is what actually makes this issue super hard sometimes because they depend a lot on the technical specifications of the weapon, which a) are complicated and b) are not always very transparent. Companies are not always very transparent publicly about how their weapons systems function.

One can understand why that is. They don’t want adversaries to come up with methods of fooling them and countermeasures. On the other hand, for people who are interested in understanding how companies are pushing the bounds of autonomy, that can be very frustrating.

Ariel: One of the things that I really like about the way you think is that it is very nuanced and takes into account a lot of these different issues. I think it’s tempting and easy and, I don’t want to make it sound like I’m being lazy, because I personally support banning lethal autonomous weapons. But I think it’s a really complicated issue, and so I’d like to know more about What are your thoughts on a ban?

Paul: There are two areas on this topic that I think is where it gets really complicated and really tricky. If you start with a broad principle that someone might have of something like, “Humans should be making decisions about lethal force,” or, “Only humans should be deciding to take human life.” There’s two areas where you try to … How do I put them into practice? And then you really run into some serious challenges.

And I’m not saying that makes it impossible because difficult answers you have to really sort of roll up your sleeves and get into some of the details of the issue. One is, how do you translate a broad concept like that into technical specifications of a weapon? If you start with an idea and say, “Well, only humans should be responsible for taking human life,” that seems like a reasonable idea.

How do you translate that into technical guidance that you give weapons developers over what they can and cannot build? That’s actually really hard, and I say that as having done this when I worked at the Pentagon and we tried to write guidance that was really designed to be internal to the US Defense Department and to give guidance to defense companies and to military researchers on what they could build.

It was hard to translate some of these abstract concepts like, “Humans should decide the targets,” to technical ideas. Well, what does that mean for how long the weapon can loiter over a target area or how big its sensor field should be or how long it can search for? You have to try to figure out how to put those technical characteristics into practice.

Let me give you two examples of a weapon to illustrate how this can be challenging. You might imagine a weapon today where a human says, “Ah, here’s an enemy target. I want to take that target out.” They launch a missile, and the missile flies towards the target. Let’s say it’s a tank. The missile uses a millimeter-wave seeker on the tank. It’s an active seeker, sends out millimeter-wave radar signatures to see the tank and illustrate it and sort of highlight it from the background and then zero in on the tank, because the tank’s moving and they need to have the sensor to hit the moving tank.

If the weapon and the sensor can only search for a very limited space in time and geography, then you’ve constrained the autonomy enough that the human is still in control of what it’s targeting. But as you start to open that aperture up, and maybe it’s no longer that it’s searching for one minute in a one-kilometer area, it’s now searching for eight hours over 1,000 kilometers, now you have a completely different kind of weapon system. Now it’s one that’s much more like … I make the analogy in the book of the difference between a police dog that might be set loose to go chase down a suspect, where the human says, “There’s the suspect. Dog, go get them,” versus a mad dog roaming the streets attacking anyone at will.

You have two different paradigms, but where do you draw the line in between? And where do you say, “Well, is 1 minute of loiter time, is it 2 minutes, is it 10 minutes, is it 20 minutes? What’s the geographic area?” It’s going to depend a lot on the target, the environment, what kind of clutter is in the environment. What might be an appropriate answer for tanks in an urban combat setting might be very different than naval ships on the high seas or submarines underwater or some other target in a different environment.

So that’s one challenge, and then the other challenge, of course, which is even more contested, is just sort of, “What’s the feasibility of a ban and getting countries to come together to actually agree to things?” because, ultimately, countries have militaries because they don’t trust each other. They don’t trust international law to constrain other countries from aggressive action. So regardless of whether you favor one country or another, you consider yourself an American or a Russian or a Chinese or a French or Israeli or Guinean or someone else, countries in general, they have militaries because they don’t trust others.

That makes … Even if you get countries to sign up to a ban, that’s a major challenge in getting people to actually adhere to, then, because countries are always fearful about others breaking these rules and cheating and getting the upper hand.

Ariel: We have had other bans. We’ve banned biological weapons, chemical weapons, landmines, space weapons. Do you see this as different somehow?

Paul: Yeah. So one of the things I go through in my book is, as comprehensive as I can come up with, a list of all of the attempts to regulate and control emerging technologies dating back to antiquity, dating back to ancient Indian prohibitions and Hindu Laws of Manu or the Mahabharata on poisoned and barbed arrows and fire-tip weapons.

It’s really a mixed bag. I like to say that there’s sort of enough examples of both successes and failures for people to pick whichever examples they want for whatever side they’re arguing for because there are many examples of successful bans. And I would say they’re largely successful. There are some examples of isolated incidences of people not adhering to them. Very few bans are universally adhered to. We certainly have Bashar al-Assad using chemical weapons in Syria today.

But bans that have been largely successful and that they’ve at least had a major effect in reducing these weapons include landmines, cluster munitions, blinding lasers, biological weapons, chemical weapons, using the environment as a weapon, placing nuclear weapons on the seabed or in orbit, placing any weapons of any kind on the moon or Antarctica, various regulations during the Cold War, anti-ballistic missile systems, intermediate-range nuclear ground-launch missiles, and then, of course, regulations on a number of nuclear weapons.

So there are a lot of successful examples. Now, on the other side of the coin, there are failed attempts to ban, famously, the crossbow, and that’s often brought up in these conversations. But in more recent memory, attempts of the 20th century to ban and regulate aircraft and air-delivered weapons, submarine warfare, of course the failure of attempts to ban poison gas in World War I. So there are examples on other sides of the ledger as well.

One of the things that I try to do in my book is get beyond sort of just picking examples that people like, and say, “Well, is there a pattern here? Are there some common conditions that make certain bans more likely to succeed or fail?” There’s been great scholarship done by some others before me that I was able to build on. Rebecca Crootof and Sean Welsh have done work on this trying to identify some common patterns.

I think that that’s a … If you want to look at this analytically, that’s a fruitful place to start, is to say, “Why do some bans succeed and some fail?” And then, when you’re looking at any new technology, whether it’s autonomous weapons or something else, where do they fall on this spectrum, and what does that suggest about the feasibility of certain attempts at regulation versus others?

Ariel: Can you expand on that a little bit? What have you found, or what have they found in terms of patterns for success versus failure for a ban?

Paul: I think there’s a couple criteria that seem to matter. One is the clarity of a ban is really crucial. Everyone needs to have a clear agreement on what is in and what is out. The simpler and clearer the definition is, the better. In some cases, this principle is actually baked into the way that certain treaties are written. I think the ban on cluster munitions is a great example of this, where the Cluster Munition Convention has a very, very simple principle in the treaty. It says, “Cluster munitions are banned,” full stop.

Now, if you go into the definition, now there’s all sorts of nuance about what constitutes a cluster munition or not. That’s where they get into some of the horse trading with countries ahead of time. But sort of the principle is no cluster munitions. The archetype of this importance of clarity comes in the success of restraint among European powers in using chemical weapons against each other in World War II. All sides had them. They didn’t use them on the battlefield against each other. Of course, Germany used them in the Holocaust and there were some other isolated incidences in World War II of use against others who didn’t have them.

But the European powers all had tens of thousands of tons of mustard gas stockpiled, and they didn’t use it against each other. At the outset of World War II, there were also attempts to restrain aerial bombing of cities. It was widely viewed as reprehensible. It was also illegal under international law at the time, and there were attempts on all sides to refrain from that. At the outset of the war, in fact, they did, and Hitler actually put a directive to the Luftwaffe. I talk about this a little bit in the book, although unfortunately, a lot of the detail on some of this stuff got cut for space, which I was disappointed by.

Hitler put a directive to the Luftwaffe saying that they were not to engage in bombing of civilian targets, a terror bombing, in Britain, they were only to engage in bombing military targets, not because he was a humanitarian, because he was concerned about Britain retaliating. This attempt at restraint failed when, in the middle of the night, a German bomber strayed off course and bombed central London by mistake. In retaliation, Churchill ordered the bombing of Berlin. Hitler was incensed, gave a speech the following day announcing the launch of the London Blitz.

So here’s an example where there was some slippage in the principle of what was allowed and what was not, and so you had a little bit of accidental crossing of the line in conflict. So the sharper and clearer this line is, the better. You could extrapolate from that and say it’s likely that if, for example, what World War II powers had agreed to in World War II was that they could only use poison gas against military targets but not against civilian targets, that it would have quickly escalated to civilian targets as well.

In the context of autonomous weapons, that’s one of the arguments why you’ve see some advocates of a ban say that they don’t support what is sometimes called a partition treaty, which is something that would create a geographic partition that would say you could only use autonomous weapons outside of populated areas. What some advocates of a ban have said is, “Look, that’s never going to hold in combat.” That sounds good. I’ve heard some international humanitarian lawyers say that, “Oh, well, this is how we solve this problem.” But in practice, I agree that’s not likely to be very feasible.

So clarity’s important. Another factor is the relative value of, the military value of a weapon, versus its perceived horribleness. I think, again, a good case in point here is the difference in the International Committee’s success in largely getting most countries to give up chemical weapons, but the lack of success on nuclear weapons. Nuclear weapons by any reasonable measure are far more terrible in terms of their immediate and long-lasting effects on human life and the environment, but they have much more military value, at least perceived military value. So countries are much more reluctant to give them up.

So that’s another factor, and then there are some other ones that I think are fairly straightforward but also matter, things like the access to the weapon and the number of actors that are needed to get agreement. If only two countries have the technology, it’s easier to get them on board than if it’s widely available and everyone needs to agree. But I think those are some really important factors that are significant.

One of the things that actually doesn’t matter that much is the legality of a weapons treaty. I’m not saying it doesn’t matter at all, but you see plenty of examples of legally binding treaties that are violated in wartime, and you see some examples, not a ton, but some examples of mutual restraint among countries when there is no legally binding agreement or sometimes no agreement at all, no written agreement. It’s sort of a tacit agreement to refrain from certain types of competition or uses of weapons.

All of those, I think, are really important factors when you think about the likelihood of a ban actually succeeding on any weapons — not just autonomous weapons, any weapons — but the likelihood of a ban actually succeeding in wartime.

Ariel: I’m probably going to want to come back to this, but you mentioned something that reminded me of another question that I had for you. And that is, in your book, you mentioned … I don’t remember what the weapon was, but it was essentially an autonomous weapon that the military chose not to use and then ended up giving up because it was so costly, and ultimately they didn’t trust it to make the right decisions.

I’m interested in this idea of the extent to which we trust the weapons to do whatever it is that they’re tasked with if they’re in some sort of autonomous mode, and I guess where we stand today with various weapons and whether military will have increasing trust in their weapons in the future.

Paul: The case study I think you’re referring to was an anti-ship missile called the Tomahawk anti-ship missile, or TASM, that was in service by the US Navy in the 1980s. That I would classify as an autonomous weapon. It was designed to go over the horizon to attack Soviet ships, and it could fly a search pattern. I think, actually, in the book I included the graphic of the search pattern that it would fly to look for Soviet ships.

The concern was that the way this would work in anti-surface warfare is the navy would send out patrol aircraft because they’re much faster. They have much longer range than ships. And they would scout for other enemy ships. The principle in a wartime environment is patrol aircraft would find a Soviet ship and then radio back to a destroyer the Soviet ship’s location, and the destroyer would launch a missile.

Now, the problem was, by the time the missile got there, the ship would have moved. So the ship would now have what the military would call an area of uncertainty that the ship might be in. They wouldn’t have the ability to continuously track the ship, and so what they basically would do was the missile would fly a search pattern over this area of uncertainty, and when it found the ship, it would attack it.

Now, at the time in the 1980s, the technology was not particularly advanced and it wasn’t very good at discriminating between different kinds of ships. So one of the concerns was that if there happened to be another kind of ship in the area that was not an enemy combatant, it still might attack it if it was within this search pattern area. Again, it’s originally cued by a human that had some indication of something there, but there was enough uncertainty that it flies this pattern on its own. And I only for that reason call it autonomous weapon because there was a great amount of uncertainty about sort of what it might hit and whether it might do so accurately. And it could, once launched, it would sort of find and attack all on its own.

So it was never used, and there was great hesitance about it being used. I interview a retired US Navy officer who was familiar with it at the time, and he talks about that they didn’t trust that its targeting was good enough that once they let it loose, that it might hit the right target. Moreover, there was the secondary problem, which is it might hit the wrong target, sort of a false positive, if you will, but it also might miss the Soviet ship, in which case they would have simply wasted a weapons system.

That’s another problem that militaries have, which is missiles are costly. They don’t have very many of them in their inventory. Particularly if it’s something like a ship or an aircraft, there’s only so many that they can carry physically on board. So they don’t want to waste them for no good reason, which is another practical to an operational consideration. So eventually it was taken out of service for what I understand to be all of these reasons, and that’s a little bit of guesswork, I should say, as to why it was taken out of service. I don’t have any official documentation saying that, but that’s at least, I think, a reasonable assumption about some of the motivating factors based on talking to people who were familiar with it at the time.

One of the things that I think is an important dynamic that I talk about in the book, which is that, that is really an acute problem, the wasting the weapon problem for missiles that are not recoverable. You launch it, you’re not going to get it back. If the enemy’s not there, then you’ve just wasted this thing. That changes dramatically if you have a drone that can return back. Now, all of the concerns about it hitting the wrong target and civilian casualties, those still exist and those are very much on the minds of at least Western military professionals who are concerned about civilian casualties and countries that care about the rule of law more broadly.

But this issue of wasting the weapon is less of an issue when you have something that’s recoverable and you can send it out on patrol. So I think it’s possible, and this is a hypothesis, but it’s possible that as we see more drones and combat drones in particular being put into service and intended to be used in contested areas where they may have jammed communications, that we start to see that dynamic change.

To your question about trust, I guess I’d say that there is a lot of concern at least among the military professionals that I talk to in the United States and in other Allied countries, NATO countries or Australia or Japan, that there was a lot of concern about trust in these systems, and in fact, I see much more confidence … I’m going to make a broad generalization here, okay? So forgive me, but in general I would say that I see much more confidence in the technology coming from the engineers who are building them at military research labs or at defense companies, than in the military professionals in uniform who have to push the button and use them, that they’re a little bit more skeptical of wanting to actually trust these and delegate, what they see as their responsibility, to this machine.

Ariel: What do you envision, sort of if we go down current trajectories, as the future of weaponry specifically as it relates to autonomous weaponry and potentially lethal autonomous weaponry? And to what extent do you think that international agreements could change that trajectory? And maybe, even, to what extent to you think countries might possibly even appreciate having guidelines to work within?

Paul: I’ll answer that, but let me first make an observation about most of the dialogue in the space. There’s sort of two different questions wrapped up in there. What is the likely outcome of a future of autonomous weapons? Is it a good future or a bad future? And then another one is, what is the feasibility of some kind of international attention to control or regulate or limit these weapons? Is that possible or unlikely to succeed?

What I tend to hear is that people on all sides of this issue tend to cluster into two camps. They tend to either say, “Look, autonomous weapons are horrible and they’re going to cause all these terrible effects. But if we just all get together, we can ban them. All we need to do is just … I don’t know what’s wrong with countries. We need to sit down. We need to sign a treaty and we’ll get rid of these things and our problems will be solved.”

Other people in the opposite camp say, “Bans don’t work, and anyways, autonomous weapons would be great. Wouldn’t they be wonderful? They could make war so great, and humans wouldn’t make mistakes anymore, and no innocent people would be killed, and war would be safe and humane and pristine.” Those things don’t necessarily go together. So it’s entirely possible … Like if you sort of imagine a two-by-two matrix. It’s really convenient that everybody’s views fit into those boxes very harmoniously, but it may not be possible.

I suspect that, on the whole, autonomous weapons that have no human control over targeting are not likely to make war better. It’s hard for me to say that would be a better thing. I can see why militaries might want them in some instances. I think some of the claims about the military values might be overblown, but there are certainly some in situations where you can imagine they’d be valuable. I think it kind of remains to be seen how valuable and what context, but you can imagine that.

But in general, I think that humans add a lot of value to making decisions about lethal force, and we should be very hesitant to take humans away. I also am somewhat skeptical of the feasibility of actually achieving restraint on these topics. I think it’s very unlikely the way the current international dynamics are unfolding, which is largely focused on humanitarian concerns and berating countries and telling them that they are not going to build weapons that comply with international humanitarian law.

I just don’t think that’s a winning argument. I don’t think that resonates with most of the major military powers. So I think that when you look at, actually, historical attempts to ban weapons, that right now what we’re seeing is a continuation of the most recent historical playbook, which is that elements of civil society have kind of put pressure on countries to ban certain weapons for humanitarian reasons. I think it’s actually unusual when you look at the broader historical arc. Most attempts to ban weapons were driven by great powers and not by outsiders, and most of them centered on strategic concerns, concerns about someone getting an unfair military advantage, or weapons making war more challenging for militaries themselves or making life more challenging for combatants themselves.

Ariel: When you say that it was driven by powers, do you mean you’d have, say, two powerful countries and they’re each worried that the other will get an advantage, and so they agree to just ban something in advance to avoid that?

Paul: Yeah. There’s a couple time periods that kind of seem most relevant here. One would be a flurry of attempts to control weapons that came out of the Industrial Revolution around the dawn of the 20th century. These included air balloons, or basically air-delivered weapons from balloons or airplanes, submarines, poison gas, what was called fulminating projectiles. You could think of projectiles or bullets that have fire in them or are burning, or exploding bullets, sawback bayonets. There was some restraint on their use in World War I, although it wasn’t ever written down, but there seems to be a historical record of some constraint there.

That was one time period, and at the time, that was all driven by the great powers at the time. So these were generally driven by the major European powers and then Japan as Japan sort of came rising on the international stage and particularly was involved as a naval power in the naval treaties. The Washington Naval Treaty is another example of this that attempts to control a naval arms race.

And then, of course, there were a flurry of arms control treaties during the Cold War driven by the US and the USSR. Some of them were bilateral. Many of them were multilateral but driven principally by those two powers. So that’s not to say there’s anything wrong with the current models of NGOs in civil society pushing for bans, because it’s worked and it’s worked in landmines and cluster munitions. I’m not sure that the same conditions apply in this instance, in large part because in those cases, there was real humanitarian harm that was demonstrated.

So you could really, I think, fairly criticize countries for not taking action because people were being literally maimed and killed every day by landmines and cluster munitions, whereas here it’s more hypothetical, and so you see people sort of extrapolating to all sorts of possible futures and some people saying, “Well, this going to be terrible,” but other people saying, “Oh, wouldn’t it be great,” and some say it’d be wonderful.

I’m just not sure that the current playbook that some people are using, which is to sort of generate public pressure, will work when the weapons are still hypothetical. And, frankly, they sound like science fiction. There was this recent open letter that FLI was involved in, and I was sitting in the break room at CNN before doing a short bit on this and talking to someone about this. They said, “Well, what are you going on about?” I said, “Well, some AI scientists wrote a letter saying they weren’t going to build killer robots.”

I think to many people it just doesn’t sound like a near-term problem. That’s not to say that it’s not a good thing that people are leading into the issue. I think it’s great that we’re seeing people pay attention to the issue and anticipate it and not wait until it happens. But I’m also just not sure that the public sentiment to put pressure on countries will manifest. Maybe it will. It’s hard to say, but I don’t think we’ve seen it yet.

Ariel: Do you think in terms of considering this to be more near term or farther away, are military personnel also in that camp of thinking that it’s still farther away, or within militaries is it considered a more feasible technology in the near term?

Paul: I think it depends a little bit on how someone defines the problem. If they define an autonomous weapon as human-level intelligence, then I think there’s a wide agreement. Well, at least within military circles. I can’t say wide agreement. There’s probably a lot of people on the podcast who might, maybe, have varying degrees of where they think that might be in terms of listeners.

But in military circles, I think there’s a perception that that’s just not a problem in the near term at all. If what you mean is something that is relatively simple but can go over a wide area and identify targets and attack them, I think many military professionals would say that the technology is very doable today.

Ariel: Have you seen militaries striving to create that type of weaponry? Are we moving in that direction, or do you see this as something that militaries are still hesitating to move towards?

Paul: That’s a tricky question. I’ll give you my best shot at understanding the answer to that because I think it’s a really important one, and part of it is I just don’t know because there’s not great transparency in what a lot of countries are doing. I have a fairly reasonable understanding of what’s going on in the United States but much less so in other places, and certainly in countries like authoritarian regimes like Russia and China, it’s very hard to glean from the outside what they’re doing or how they’re thinking about some of these issues.

I’d say that almost all major military powers are racing forward to invest in more robotics and autonomous artificial intelligence. I think for many of them, they have not yet made a decision whether they will cross the line to weapons that actually choose their own targets, to what I would call an autonomous weapon. I think for a lot of Western countries, they would agree that there’s a meaningful line there. They might parse it in different ways.

The only two countries that have really put any public guidance out on this are the United States and the United Kingdom, and they actually define autonomous weapon in quite different ways. So it’s not clear from that to interpret sort of how they will treat that going forward. US defense leaders have said publicly on numerous occasions that their intention is to keep a human in the loop, but then they also will often caveat that and say, “Well, look. If other countries don’t, we might be forced to follow suit.”

So it’s sort of in the loop for now, but it’s not clear how long “for now” might be. I think it’s not clear to me whether countries like Russia and China even see the issue in the same light, whether they even see a line in the same place. And at least some of the public statements out of Russia, for example, talking about fully roboticized units or some Russian defense contractors claiming to have built autonomous weapons that can do targeting on their own, it would suggest that they may not even see the light in the same way.

In fairness, that is a view that I hear among some military professionals and technologists. I don’t want to say that’s the majority view, but it is at least a significant viewpoint where people will say, “Look, there’s no difference between that weapon, an autonomous weapon that can choose its own targets, and a missile today. It’s the same thing, and we’re already there.” Again, I don’t totally agree, but that is a viewpoint that’s out there.

Ariel: Do you think that the fact that countries have these differing viewpoints is a good reason to put more international pressure on developing some sort of regulations to try to bring countries in line, bring everyone onto the same page?

Paul: Yeah. I’m a huge supporter of the process that’s been going on with the United Nations. I’m frustrated, as many are, about the slowness of the progress. Part of this is a function of diplomacy, but part of this is just that they haven’t been meeting very often. When you add up all of the times over the last five years, it’s maybe five or six weeks of meetings. It’s just not very much time they spend together.

Part of it is, of course … Let’s be honest. It’s deliberate obstinacy on the part of many nations who want to slow the progress of talks. But I do think it would be beneficial if countries could come to some sort of agreement about rules of the road, about what they would see as appropriate in terms of where to go forward.

My view is that we’ve gotten the whole conversation off on the wrong foot by focusing on this question of whether or not to have a legally binding treaty, whether or not to have a ban. If this was me, that’s not how I would have framed the discussion from the get-go, because what happens is that many countries dig in their heels because they don’t want to sign to a treaty. So they’re just like they start off on a position of, “I’m opposed.” They don’t even know what they’re opposed to. They’re just opposed because they don’t want to sign a ban.

I think a better conversation to have would be to say, “Let’s talk about the role of autonomy and machines and humans in lethal decision-making in war going forward. Let’s talk about the technology. Let’s talk about what it can do, what it can’t do. Let’s talk about what humans are good at and what they’re not good at. Let’s think about the role that we want humans to play in these kinds of decisions on the battlefield. Let’s come up with a view of what we think ‘right’ looks like, and then we can figure out what kind of piece of paper we write it down on, whether it’s a piece of paper that’s legally binding or not.”

Ariel: Talking about what the technology actually is and what it can do is incredibly important, and in my next interview with Toby Walsh, we try to do just that.

Toby: I’m Toby Walsh, I’m a Scientia Professor of Artificial Intelligence at the University of New South Wales, which is in Sydney, Australia. I’m a bit of an accidental activist, in the sense that I’ve been drawn in, as a responsible scientist, to the conversation about the challenges, the opportunities, the risks that artificial intelligence pose in fighting war. And there’s many good things that AI’s going to do in terms of reducing casualties and saving lives, but equally, I’m very concerned, like many of my colleagues are, about the risks that it poses, especially when we hand over full control to computers and remove humans from the loop.

Ariel: So that will segue nicely into the first question I had for you, and that was what first got you thinking about lethal autonomous weapons? What first gave you reason for concern?

Toby: What gave me concern about the development of lethal autonomous weapons was to see prototype weapons being developed. And knowing the challenges that AI poses — we’re still a long way away from having machines that are as intelligent as humans, and knowing the limitations, and being very concerned that we were handing over control to machines that weren’t technically capable, and certainly weren’t morally capable, of making the right choices. And therefore, too, I felt a responsibility, as any scientist, that we want AI to be used for good and not for bad purposes. Unfortunately, like many technologies, it’s completely dual use. They’re pretty much the same algorithms that are going to go into your autonomous car, that are going to identify, track, and avoid pedestrians and cyclists, are going to go into autonomous drones that are going to identify combatants, track them, and kill them. It’s a very small change to turn one algorithm into the other. And we’re going to want autonomous cars, they’re going to bring great benefits to our lives, save lots of lives, give mobility to the elderly, to the young, to the disabled. So there can be great benefits for those algorithms, but equally, the same algorithms can be repositioned and used to make warfare much more terrible and much more terrifying.

Ariel: And with AI, we’ve seen some breakthroughs in recent years, just generally speaking. Do any of those give you reason to worry that lethal autonomous weapons are closer than maybe we thought they might have been five or ten years ago? Or has the trajectory been consistent?

Toby: The recent breakthroughs have to be put into the context and that they’ve been in things like games, like the game of Go, very narrow-focus task without uncertainty. The real world doesn’t interfere when you’re playing a game of Go, it’s very precise rules and very constrained actions that you need to do and things that you need to think about. And so to us it’s good to see progress in these narrow domains. We’re still not making much progress, there’s still a huge amount to be done to build machines that are as intelligent as us. But it’s not machines as intelligent as us that I’m very worried about, although that will be in 50 or 100 years time, when we have them, that will be something that we’ll have to think about then.

It’s actually stupid AI, the fact that we’re already thinking about giving responsibility to quite stupid algorithms that really cannot make the right distinctions, either in a technical sense, in terms of being able to distinguish combatants and civilians as required by international humanitarian law. And also from a moral ground, that they really can’t decide things like proportionality, they can’t make the moral distinctions that humans have. They don’t have any of the things like empathy and consciousness that allow us to make those difficult decisions that are made in the battlefield.

Ariel: If we do continue on our current path and we aren’t able to get a ban on these weapons, what concerns do you have? What do you fear will happen? Or what do you anticipate? What type of weapons?

Toby: The problem is, I think with the debate, is that people try and conflate the concerns that we have into just one concern. And there’s different concerns at different points in time and different developments of the technology.

So the concerns I have in the next 10 years or so are definitely concerns I would have in 50 years time. Now the concerns I would have in the next 10 years or so is largely around incompetence. The machines would not be capable of making the right distinctions. And later on, there are concerns that come, as the machines become more competent, different concerns. They would actually now change the speed, the duration, the accuracy of war. And they would be very terrible weapons that any ethical safeguards that we could, at that point, build in, might be removed by bad actors. Sadly, plenty of bad actors out there who would be willing to remove any of the ethical safeguards that we might build in. So there’s not one concern. I think, unfortunately, when you hear the discussion, often it’s people try and distill it down to just a single concern at a single point in time. And depending on the state of the technology, there are different concerns as the technology gets more sophisticated and more mature. But it’s only to begin with, I would be very concerned that we will introduce a rather stupid algorithm into battlefield and they couldn’t make the right moral and right technical distinctions that are required until IHL.

Ariel: Have you been keeping track at all of what sorts of developments have been coming out of different countries?

Toby: You can see, if you just go into YouTube, you can see there are prototype weapons. Pretty much in every theater of battle — in the air, there are autonomous drones and PA systems have autonomous drones that’s now been under development for a number of years. And on the sea, the US Navy’s launched, more than a year ago now, it’s first fully autonomous ship. And interestingly, when it was launched, they said it would just have defensive measures that we can use, hunting for mines, hunting for submarines, and now they’re talking about putting weapons on it. Under the sea, we have an autonomous submarine, an autonomous submarine the size of a bus that’s believed to be halfway across the Pacific, fully autonomously. And on land there are a number of different autonomous weapons. Certainly there are prototypes of autonomous tanks, autonomous sentry robots, and the like. So there is a bit of an arms race happening and it’s certainly very worrying to see that we’re sort of locked into one of these bad equilibria, where everyone is racing to develop these weapons, in part just because the other side is.

China is definitely one of the countries to be worried about. It’s made very clear its ambitions to seek economic military dominance through the use, in large part, in technologies like artificial intelligence and it’s investing very heavily to do that. The military and commercial companies are very tightly close together. It will give it quite a unique position, perhaps even some technical advantages to the development of AI, especially in the battlefield. So it was quite surprising, all of us at the UN meeting in April were pretty surprised when China came out and called for a ban on the deployment of autonomous weapons. It didn’t say anything about development of autonomous weapons, so that’s probably not as far as I would like countries to go because if they’re developed, then you still run the risk that they will be used, accidentally or otherwise. The world is still not as safe as if they’re not actually out there with their triggers waiting to go. But it’s interesting to see that they made that call. It’s hard to know whether they’re just being disruptive or whether they really do see the serious concern we have.

I’ve talked to my colleagues, academic researchers in China around, and they’ve been, certainly in private, sympathetic to the cause of regulating autonomous weapons. Of course, unfortunately, China is a country in which it’s not possible, in many respects, to talk freely. And so they’ve made it very clear that it would be a career-killing move for them, perhaps, to speak publicly like scientists in the West have done about these issues. Nevertheless, we have drawn signatures from Hong Kong, where it is possible to speak a bit more freely, which I think demonstrates that, within the scientific community internationally, across nations, there is actually broad support for these sorts of actions. But the local politics may prevent scientists from speaking out in their home country.

Ariel: A lot of the discussion around lethal autonomous weapons focuses on the humanitarian impact, but I was wondering if you could speak at all to the potential destabilizing effect that they could have for countries?

Toby: One of the aspects of autonomous weapons that I don’t think is discussed enough is quite how destabilizing they will be as a technology. They will be relatively easy, certainly cheap to get your hands on. As I was saying when I was in Korea most recently to the Koreans, the presence of autonomous weapons would make South Korea even less safe than it is today. A country like North Korea has demonstrated it’s willing to go to great lengths to attain atomic weapons. And it would be much easier for them to obtain autonomous weapons and that would put South Korea in a very difficult situation because if they were attacked by autonomous weapons and they weren’t able to defend themselves adequately, then that would escalate and we might well find ourselves in a nuclear conflict. One that, of course, none of us would like to see. So they will be rather destabilizing, like the weapons that fall into the wrong hands, they’ll be used not just by the superpowers, they’ll be used by smaller nations, even rogue states. Potentially, they might even be used by terrorist organizations.

And then another final aspect that makes them very destabilizing is one of attribution. If someone attacks you with autonomous weapons, then it’s going to be very hard to know who’s attacked you. It’s not like you can bring one of the weapons down, you can open it up and look inside it. It’s not going to tell you who launched it. There’s not a radio signal you can follow back to a base to find out who’s actually controlling this. So it’s going to be very hard to work out who’s attacking you and the countries will deny, vehemently, that it’s them, even if they went and attacked you. So they will be perfect weapons of terror, perfect weapons for troubling nations to do their troubling with.

One other concern that I have as a scientist is the risk of the field receiving a bad reputation by the misuse of the technology. We’ve seen this in areas like genetically modified crops. The great benefits that we might have had by that technology — making crops more disease-resistant, more climate-resistant, and that we need, in fact, to deal with the pressing problems that climate change and growing population’s put on our planet — have been negated by the fact that people were distrustful of the technology. And we run a similar sort of risk, I think, with artificial intelligence. That if people see the AI being used to fight terrible wars and to be used against civilians and other people, that the technology will have a stain on it. And all the many good uses and the great potential of the technology might be at risk because people will turn against all sorts of developments of artificial intelligence. And so that’s another risk and another reason many of my colleagues feel that we have to speak out very vocally to ensure that we get the benefits and that the public doesn’t turn against the whole idea of AI being used to improve the planet.

Ariel: Can you talk about the different between an AI weapon and an autonomous weapon?

Toby: Sure. There’s plenty of good things that the military can use artificial intelligence for. In fact, the U.S. military has historically been one of the greatest funders of AI research. There’s lots of good things you can use artificial intelligence for, in the battlefield and elsewhere. No one should risk a life or limb clearing a minefield, a perfect job for a robot because it could go rogue and blow up the robot and you can replace the robot easily. Equally, filtering through all the information coming at you, making sure that you can work out who are combatants and who are civilians, using AI to help you in a situation, once again, that’s a perfect job that will actually save lives, stop some of the mistakes that inevitably happen in the fog of war. And in lots of other areas in logistics and so on, there’s lots of good things in humanitarian aid that AI will be used for.

So I’m not against the use of AI in militaries, I think I can see great potential for it to save lives, to make war a little less dangerous. But there is a complete difference when we look at removing humans completely from the decision loop in a weapon and ending up with a fully autonomous weapon where it is the machine that is making the final decision as to who lives and who dies. And as I said before, that raises many technical, moral, and legal questions that we shouldn’t go down that line. And ultimately, I think there’s a very big moral argument, which is that we shouldn’t hand over those sorts of decisions, that would be taking us into a completely new moral territory that we’ve never seen before in our lives. Warfare is a terrible thing and we sanction it, and in part because we’re risking our own lives and it should be a matter of last resort, not something that we hand over easily to machines.

Ariel: Is there anything else that you think we should talk about?

Toby: I think we’d want to talk about whether regulating autonomous weapons, regulating AI, would hinder the benefits for peaceful or non-military uses. I’m very unconcerned, as many of my colleagues, that if we regulate autonomous weapons that that will actually hinder the development, in any way at all, of the peaceful and the good uses of AI. In fact, as I had mentioned earlier, I’m actually much more fearful that if we don’t regulate, there will be a backlash against the technology as a whole and that will actually hinder the good uses of AI. So I’m completely unconcerned, just like the bans on chemical weapons have not held back chemistry, the bans on biological weapons have not held back biology, the bans on nuclear weapons have not held back the development of peaceful uses of nuclear power. So I’m completely unconcerned, as many of my colleagues are, that regulating autonomous weapons will actually hold back the field in any way at all, in fact quite the opposite.

Ariel: Regulations for lethal autonomous weapons will be more effective if the debate is framed in a more meaningful way, so I’m happy Richard Moyes could talk about how the concept of meaningful human control has helped move the debate in a more focused direction.

Richard: I’m Richard Moyes, and I am Managing Director of Article 36, which is a non-governmental organization which focuses on issues of weapons policy and weapons law internationally.

Ariel: To start, you have done a lot of work, I think you’re credited with coining the phrase “meaningful human control.” So I was hoping you could talk a little bit about first, what are some of the complications around defining whether or not a human is involved and in control, and maybe if you could explain some of the human in the loop and on the loop ideas a little bit.

Richard: We developed and started using the term meaningful human control really as an effort to try and get the debate on autonomous weapons focused on the human element, the form and nature of human engagement that we want to retain as autonomy develops in different aspects of weapons function. First of all, that’s a term that’s designed to try and structure the debate towards thinking about that human element.

I suppose, the most simple question that we raised early on when proposing this term was really a recognition that I think everybody realizes that some form of human control would be needed over new weapon technologies. Nobody is really proposing weapon systems that operate without any human control whatsoever. At the same time, I think people could also recognize that simply having a human being pressing a button when they’re told to do so by a computer screen, without really having any understanding of what the situation is that they’re responding to, having a human simply pressing a button without understanding of the context, also doesn’t really involve human control. So even though in that latter situation, you might have a human in the loop, as that phrase goes, unless that human has some substantial understanding of what the context is and what the implications of their actions are, then simply a pro forma human engagement doesn’t seem sufficient either.

So, in a way, the term meaningful human control was put forward as a way of shifting the debate onto that human element, but also putting on the table this question of, well, what’s the quality of human engagement that we really need to see in these interactions in order to feel that our humanity is being retained in the use of force.

Ariel: Has that been successful in helping to frame the debate?

Richard: I think this sort of terminology, of course, different actors use different terms. Some people talk about necessary human control, or sufficient human control, or necessary human judgment. There’s different word choices there. I think there are pros and cons to those different choices, but we don’t tend to get too hung up on the specific wording that’s chosen there. The key thing is that these are seen bundled together as being a critical area now for discussion among states and other actors in multilateral diplomatic conversation about where the limits of autonomy in weapon systems lie.

I think coming out of the Group of Governmental Experts meeting of the Convention on Conventional Weapons that took place earlier this year, I think the conclusion of that meeting was more or less that this human element really does now need to be the focus of discussion and negotiation. So one way or another, I think the debate has shifted quite effectively onto this issue of the human element.

Ariel: What are you hoping for in this upcoming meeting?

Richard: Perhaps what I’m hoping for and what we’re going to get, or what we’re likely to get, might be rather different things. I would say I’d be hoping for states to start to put forward more substantial elaborations of what they consider the necessary human control, human element in the use of force to be. More substance on that policy side would be a helpful start, to give us material where we can start to see the differences and the similarities in states’ positions.

However, I suspect that the meeting in August is going to focus mainly on procedural issues around the adoption of the chair’s report, and the framing of what’s called the mandate for future work of the Group of Governmental Experts. That probably means that, rather than so much focus on the substance, we’re going to hear a lot of procedural talk in the room.

That said, in the margins, I think there’s still a very good opportunity for us to start to build confidence and a sense of partnership amongst states and non-governmental organizations and other actors who are keen to work towards the negotiation of an instrument on autonomous weapon systems. I think building that partnership between sort of progressive states and civil society actors and perhaps others from the corporate sector, building that partnership is going to be critical to developing a political dynamic for the period ahead.

Ariel: I’d like to go back, quickly, to this idea of human control. A while back, I talked with Heather Roff, and she gave this example, I think it was the empty hanger problem. Essentially what it is is no one expects some military leader to walk down to the airplane hangar and discover that the planes have all gone off to war without anyone saying something.

I think that gets at some of the confusion as to what human control looks like. You’d mentioned briefly the idea that a computer tells a human to push a button, and the human does that, but even in fully autonomous weapon systems, I think there would still be humans somewhere in the picture. So I was wondering if you could elaborate a little bit more on maybe some specifics of what it looks like for a human to have control or maybe where it starts to get fuzzy.

Richard: I think that we recognize that in the development of weapon technologies, already we see significant levels of automation, and a degree of handing over certain functions to sensors and to assistance from algorithms and the like. There are a number of areas that I think are of particular concern to us. I think, in a way, this is to recognize that a commander needs to have a sufficient contextual understanding of where it is that actual applications of force are likely to occur.

Already, we have weapon systems that might be projected over a relatively small area, and within that area, they will identify the heat shape of an armored fighting vehicle for example, and they may direct force against that object. That’s relatively accepted in current practice, but I think it’s accepted so long as we recognize that the area over which any application of force may occur is actually relatively bounded, and it’s occurring relatively shortly after a commander has initiated that mission.

Where I think my concerns, our concerns, lie is that that model of operation could be expanded over a greater area of space on the ground, and over a longer period of time. As that period of time and that area of space on the ground increase, then the ability of a commander to actually make an informed assessment about the likely implications of the specific applications of force that take place within that envelope becomes significantly diluted, to the point of being more or less meaningless.

For us, this is linked also to the concept of attacks as a term in international law. There’s a legal obligation that bears on human commanders at their unit of the attack, so there are certain legal obligations that a human has to fulfill for an attack. Now an attack doesn’t mean firing one bullet. An attack could retain a number of applications of actual force, but it seems to us that if you simply expand the space and the time over which an individual weapon systems can identify target objects for itself, ultimately you’re eroding that notion of an attack, which is actually a fundamental building block of the structure of the law. You’re diluting that legal framework to the point of it arguably being meaningless.

We want to see a reasonably constrained period of, say, let’s call it independence of operation for a system, it may not be fully independent, but where a commander has the ability to sufficiently understand the contextual parameters within which that operation is occurring.

Ariel: Can you speak at all, since you live in the UK, on what the UK stance is on autonomous weapons right now?

Richard: I would say the UK has, so far, been a somewhat reluctant dance partner on the issue of autonomous weapons. I do see some, I think, positive signs of movement in the UK’s policy articulations recently. One of the main problems they’ve had in the past is that they adopted a definition of lethal autonomous weapon systems, which is the terminology used in the CCW. It’s undetermined what this term lethal autonomous weapon systems means. That’s a sort of moving target in the debate, which makes the discussion quite complicated.

But the UK adopted a definition of that term which was somewhat in the realm of science fiction as far as we’re concerned. They describe lethal autonomous weapon systems as having the ability to understand a commander’s intent. I think, in doing so, they were suggesting an almost human-like intelligence within the system, which is a long way away, if even possible. It’s certainly a long way away from where we are now, and where already developments of autonomy in weapon systems are causing legal and practical management problems. By adopting that sort of futuristic definition, they a little bit ruled themselves out of being able to make constructive contributions to the actual debate about how much human control should there be in the use of force.

Now recently in certain publications, the UK has slightly opened up some space to recognize that that definition might actually not be so helpful, and maybe this focus on the human control element that needs to be retained is actually the most productive way forward. Now how positive the UK will be, from my perspective, in that discussion, and then talking about the level of human control that needs to be retained? I think that remains to be seen, but I think at least they’re engaging with some recognition that that’s the area where there needs to be more policy substance. So finger’s crossed.

Ariel: I’d asked Richard about the UK’s stance on autonomous weapons, but this is a global issue. I turned to Mary Wareham and Bonnie Docherty for more in-depth information about international efforts at the United Nations to ban lethal autonomous weapons.

Bonnie: My name’s Bonnie Docherty. I’m a senior researcher at Human Rights Watch, and also the director of Armed Conflict and Civilian Protection at Harvard Law School’s International Human Rights Clinic. I’ve been working on fully autonomous weapons since the beginning of the campaign doing most of the research and writing regarding the issue for Human Rights Watch and Harvard.

Mary: This is Mary Wareham. I’m the advocacy director of the Arms Division at Human Rights Watch. I serve as the global coordinator of the Campaign to Stop Killer Robots. This is the coalition of non-governmental organizations that we co-founded towards the end of 2012 and launched in April 2013.

Ariel: What prompted the formation of the Campaign to Stop Killer Robots?

Bonnie: Well, Human Rights Watch picked up this issue, we published our first report in 2012. Our concern was the development of this new technology that raised a host of concerns, legal concerns, compliance with international and humanitarian law and human rights law, moral concerns, accountability concerns, scientific concerns and so forth. We launched a report that was an initial foray into the issues, trying to preempt the development of these weapons before they came into existence because the genie’s out of the bottle, it’s hard to put it back in, hard to get countries to give up a new technology.

Mary: Maybe I can follow up there just to establish the Campaign to Stop Killer Robots. I did a lot of leg work in 2011, 2012 talking to a lot of the people that Bonnie was talking to for the preparation of the report. My questions were more about what should we do once we launch this report? Do you share the same concerns that we have at Human Rights Watch, and, if so, is there a need for a coordinated international civil society coalition to organize us going forward and to present a united voice and position to governments who we want to take action on this? For us, working that way in a coalition with other non-governmental organizations is what we do. We’ve been doing it for the two last decades on other humanitarian disarmament issues, the International Campaign to Ban Landmines, the Cluster Munition Coalition. We find it’s more effective when we all try to work together and provide a coordinated civil society voice. There was strong interest, and therefore, we co-founded the Campaign to Stop Killer Robots.

Ariel: What prompted you to consider a ban versus your trying to … I guess I don’t know other options there might have been.

Bonnie: We felt from the beginning that what was needed to address fully autonomous weapons is a preemptive ban on development, production and use. Some people have argued that existing law is adequate. Some people have argued you only need to regulate it, to limit it to certain circumstances, but in our mind a ban is essential, and that draws on past work on other conventional weapons such as landmines and cluster munitions, and more recently nuclear weapons.

The reason for a ban is that if you allow these weapons to exist, even to come into being, to be in countries’ arsenals, they will inevitably get in the hands of dictators or rogue actors that will use them against the law and against the rules of morality. They will harm combatants as well as civilians. It’s impossible once a weapon exists to restrict it to a certain circumstance. I think those who favor regulation assume the user will follow all the rules, and that’s just not the way it happens. We believe it should be preemptive because once they come into existence it’s too late. They will be harder to control, and so if you prevent them from even happening that will be the most effective solution.

The last point I’d make is that it also increases the stigma against the weapons, which can influence even countries that aren’t party to a treaty banning them. This is proven in past weapons treaties, and even there’s been a preemptive ban on blinding lasers in the 1990s, and that’s been very effective. There is legal precedent for this, and many arguments for why a ban is the best solution.

Mary: Yeah, there’s two ways of framing that call, which is not just the call of Human Rights Watch, but the call of the Campaign to Stop Killer Robots. We seek a preemptive ban on the development, production and use of fully autonomous weapons. That’s a kind of negative way of framing it. The positive way is that we want to retain meaningful human control over the use of force and over weapons systems going forward. There’s a lot of interest, and I’d say convergence on those two points.

We’re five years on since the launch of the campaign, 26 countries are now supporting the call for a ban and actively trying to get us there, and an even larger number of countries, actually, virtually all of the ones who’ve spoken to-date on this topic, acknowledge the need for some form of human control over the use of force and over weapons systems going forward. It’s been interesting to see in the five diplomatic meetings that governments have held on this topic since May 2014, the discussions keep returning to the notion of human control and the role of the human and how we can retain that going forward because autonomy and artificial intelligence are going to be used by militaries. What we want to do, though, is draw a normative line and provide some guidance and a framework going forward that we can work with.

Ariel: You just referred to them as fully autonomous weapons. At FLI we usually talk about lethal autonomous weapons versus non-lethal fully autonomous weapons, and so that sort of drives me to the question of, to what extent do definitions matter?

Then, this is probably a completely different question, how are lethal autonomous weapons different from conventional weapons? The reason I’m combining these two questions is because I’m guessing definition does play a little bit of a role there, but I’m not sure.

Bonnie: Well, it’s important for countries to make international law they have to have a general, common understanding of what we’re talking about. Generally, in a legal treaty the last thing to be articulated is the actual definition. It’s premature to get a detailed, technical definition, but we feel that, although a variety of names have been used, lethal autonomous weapon systems, fully autonomous weapons, killer robots, in essence they’re all talking about the same thing. They’re all talking about a system that can select a target and choose to fire on that target without meaningful human control. There’s already convergence around this definition, even if it hasn’t been defined in detail. In terms of conventional munitions, they are, in essence, a conventional munition if they deploy conventional weapons. It depends on what the payload is. If a fully autonomous system were launching nuclear weapons it would not be a conventional weapon. If it’s launching cluster munitions it would be a conventional. It’s not right to say they’re not conventional weapons.

Mary: The talks are being held at the Convention on Conventional Weapons in Geneva. This is where governments decided to house this topic. I think it’s natural for people to want to talk about definitions. From the beginning that’s what you do with a new topic, right? You try and figure out the boundaries of what you’re discussing here. Those talks in Geneva and the reporting that has been done to date and all of the discourse, I think it’s been pretty clear that this campaign and this focus on fully autonomous weapons is about kinetic weapons. It’s not about cyber, per se, it’s about actual things that can kill people physically.

I think the ICRC, the Red Cross, has made it an important contribution with its suggestion to focus on the critical functions of weapons systems, which is what we were doing in the campaign, we just weren’t calling it that. That’s this action of identifying and selecting a target, and then firing on it, using force, lethal or otherwise. Those are the two functions that we want to ensure remain under human control, under meaningful human control.

For some others, some other states, they like to draw what we call the very wide definition of meaningful human control. For some of them it means good programming, nice design, a weapons review, a kind of legal review of if the weapon system will be legal and if they can proceed to develop it. You could kind of cast a very wide loop when you’re talking about meaningful human control, but for us the crux of the whole thing is about this notion of selecting targets and firing on them.

Ariel: What are the concerns that you have about this idea of non-human control? What worries you about that?

Mary: Of autonomy in weapon systems?

Ariel: Yeah, essentially, yes.

Mary: We’ve articulated legal concerns here at Human Rights Watch just because that’s where we always start, and that’s Bonnie’s area of expertise, but there are much broader concerns here that we’re also worried about, too. This notion of crossing a moral line and permitting a machine to take human life on the battlefield or in policing or in border control and other circumstances, that’s abhorrent, and that’s something that the Nobel Peace Laureates, the faith leaders and the others involved in the Campaign to Stop Killer Robots want to prevent. For them that’s a step too far.

They also worry about outsourcing killing to machines. Where’s the ethics in that? Then, what impact is this going to have on the system that we have in place globally? How will it be destabilizing in various regions, and, as a whole, what will happen when dictators and one-party states and military regimes get ahold of fully autonomous weapons? How will they use them? How will non-state armed groups use them?

Bonnie: I would just add, building on what Mary said, another reason human control is so important is that humans bring judgment. They bring legal and ethical judgment based on their innate characteristics, on their understanding of another human being, of the mores of a culture, and that a robot cannot bring, certain things cannot be programmed. For example, when they’re weighing whether the military advantage will justify an attack if it causes civilian harm, they apply that judgment, which is both legal and ethical. A robot won’t have that, that’s a human thing. Losing humanity in use of force, potentially, violate the law, and as well as raise serious moral concerns that Mary discussed.

Ariel: I want to go back to the process to get these weapons banned. It’s been going on for quite a few years now. I was curious, is that slow, or is that just sort of the normal speed for banning a weapon?

Mary: Look at nuclear weapons, Ariel.

Ariel: Yeah, that’s a good point. That took a while.

Mary: That took so many years, you know? That’s the example that we’re trying to avoid here. We don’t want to be negotiating a non-proliferation treaty in 20 years time with the small number of countries who’ve got these and the other states who don’t. We’re at a crossroads here. Sorry to interrupt you.

Ariel: No, that was a good point.

Mary: There have been five meetings on this topic to date at the United Nations in Geneva, but each of those meetings has only been up to a week long, so, really, it’s only five weeks of talks that have happened in the last four years. That’s not much time to make a lot of progress to get everybody around the same table understanding, but I think there’s definitely been some progress in those talks to delineate the parameters of this issue, to explore it and begin to pull apart the notion of human control and how you can ensure that that’s retained in weapons systems in the selection of targets and the use of force. There’s a wide range of different levels of knowledge on this issue, not just in civil society and academia and in the public, but also within governments.

There’s a lot of leg work to be done there to increase the awareness, but also the confidence of governments to feel like they can deal with this. What’s happened, especially I think in the past year, has been increased calls to now move from exploring the issue and talking about the parameters of the challenge to, “What are we good do about it?” That’s going to be the big debate at the next meeting, which is coming up at the end of August, is what will the recommendation be for future work? Are the governments going to keep talking about this, which we hope they do, but what are they going to do about it, more importantly?

We’re seeing, I think, a groundswell of support now for moving towards an outcome. States realize that they do not have the time or the money to waste on inconclusive deliberations, and so they met to be exploring options on pathways forward, but there’s really not that many options. As has been mentioned, states can talk about international law and the existing rules and how they can apply them and have more transparency there, but I think we’ve moved beyond that.

There’s kind of a couple of possibilities which will be debated. One is political measures, political non-binding declaration. Can we get agreement on some form of principles over human control? That sounds good, but it doesn’t go nearly far enough. We could create new international law. How do we do that in this particular treaty at the Convention on Conventional Weapons? You move to a negotiating mandate, and you set the objective of negotiating a new protocol under the Convention on Conventional Weapons. At the moment, there has been no agreement to move to negotiate new international law, but we’re expecting that to be the main topic of debate at the next meeting because they have to decide now what they’re going to do next year.

For us, the biggest, I think, developments are happening outside of the room right now rather than in Geneva itself. There’s a lot of activity now starting to happen in national capitols by governments to try and figure out what their position is on this, what their policy is on this, but there’s more prodding and questioning and debate starting to happen in national parliaments, and that has to happen in order to determine what the government position is on this and what’s going to happen on it. Then we have the examples of the open letters, the sign-on letters, ethical principles, there’s all sorts of new things that are coming out in recent weeks that I think will be relevant to what the governments are discussing, and we hope will provide them with impetus to move forward with focus and purpose here.

We can’t put a timeline on by when they might create a new international treaty, but we’re saying you can do this quickly if you put your mind to it and you say that this is what you want to try and achieve. We believe that if they move to a negotiating mandate at the end of this year, they could negotiate the treaty next year. Negotiating the treaty is not the part that takes the long time. It’s about getting everybody into the position where they want to create new international law. The actual process of negotiating that law should be relatively swift. If it takes longer than a year or two, then it runs the risk of turning into another set of inconclusive deliberations that don’t produce anything. For us, the goal is absolutely crucial to get in there at the beginning. The goal at the moment has gone from informal talks to formal talks, but, still, with no option or outcome.

Ariel: What is some of the resistance that you’re facing to moving towards a ban? Are governments worried that they’re going to miss out on a great technology, or is there some other reason that they’re resisting?

Mary: Just to say, 85 countries have spoken out on this topic to date. Most of them not at any great length, but just to say, “This is important. We’re concerned. We support the international talks.” We have a majority of countries now who want to move towards negotiating new international law. Who’s the blockages at the moment? At the last round of talks and at the previous ones it was basically Israel, Russia and the United States who were saying it’s premature to decide where these talks should lead. We need to further explore and discuss the issues before we can make any progress. For others, now people are less patient with that position, and it will be interesting to see if those three countries in particular change their minds here.

The particular treaty that we’re at, the Convention on Conventional Weapons, the states there take their decisions by consensus, which means they can’t vote. There’s no voting procedures there. They have to strive for consensus where everybody in the room agrees, or at least does not object with moving forward. That threat of a kind of a blocking of consensus is always there, especially from Russia, but we’ll see. There’s no kind of pro-killer robot state which is saying, “We want these things. We need these things,” right now, at least not in the diplomatic talks. The only countries who have wanted to talk about the potential advantages or benefits are Israel and the United States. All of the other countries who speak about this are more concerned about understanding and coming to grips with all of the challenges that are raised, and then figuring out what the regulatory framework should be.

Ariel: Bonnie, was there anything you wanted to add to that?

Bonnie: I think Mary summarized the key points. I was just going to say that there’s some people who would argue that we should wait and see what the technology would bring, we don’t know where it’ll go. Our argument counter to that is something called the precautionary principle, that even if there’s scientific uncertainty about where a technology will go, if there’s a significant risk of public harm, which there is in this case, that the scientific uncertainty should not stand in the way of action. I think that the growing number of states that have expressed concern about these weapons, and the majority, the almost consensus or the merging around the need for human control show that there is willingness to act at this point. As Mary said, this is not a situation where people are advocating, and I think that in the long run the agreement that there should be human control over the use of force will outweigh any hesitation based on the wait-and-see approach.

Mary: We had a good proposal, or not proposal, but offer from the United Nations Secretary General in this big agenda for disarmament framework that he launched a couple of months ago, saying that he stands ready to support the efforts of UN member states to elaborate new measures on lethal autonomous weapon systems, including legally-binding arrangements. For him, he wants states to ensure that humans remain at all times in control over the use of force. To have that kind of offer of support from the highest level at the United Nations I think is very important.

The other recent pledges and commitments, the one by the 200 technology companies and more than 2600 scientists and AI experts and other individuals committing not to develop lethal autonomous weapons systems, that’s a very powerful message, I think, to the states that these groups and individuals are not going to wait for the regulation. They’re committing not to do it, and this is what they expect the governments to do as well. We also saw the ethical principles issued by Google in recent weeks and this pledge by the company not to design or develop artificial intelligence for use in weapons. All of these efforts and initiatives are very relevant to what states need to do going forward. This is why we in the Campaign to Stop Killer Robots welcome them and encourage them, and want to ensure that we have as much of a broad-based appeal to support the government action that we need taken.

Ariel: Can you talk a little bit about what’s happening with China? Because they’ve sort of supported a ban. They’re listed as supporting a ban, but it’s complicated.

Mary: It’s funny because so many other countries that have come forward and endorsed the call for a ban have not elicited the same amount of attention. I guess it’s obviously interesting, though, for China to do this because everybody knows about the investments that China is making into military applications of artificial intelligence and autonomy. We see the weapons systems that are in development at the moment, including swarms of very small miniature drones, and where will that head?

What China thinks about this issue matters. At the last meeting, China basically endorsed the call for a ban, but said — there’s always a but — that their support was limited to prohibiting use only, and to not address development or production. For us it’s a partial ban, but we put them on the list that the campaign maintains, and they’re the first state to have an asterisk by its entry saying, “Look, China is on the ban list, but it’s not fully committed here.” We needed to acknowledge that because it wasn’t really the first that China had hinted it would support creating new international law. It has been hinting at this in previous papers, including one that found that China’s review of existing international law found so many questions and doubts raised that it does see a need to create international law specific to fully autonomous weapons systems. China gave the example of the blinding lasers protocol at the CCW which prohibits laser weapons that would permanently blind human soldiers.

I think the real news on China is that its position now saying that existing law is insufficient and we need to create new international rules, splits the P5, the permanent five members of the United Nations Security Council. You have Russia and the United States arguing that it’s too early to determine what the outcome should be, and the UK — Richard can explain better exactly what the UK wants — but it seems to be satisfied with the status quo. Then France is pursuing a political declaration, but not legally-binding measures. There’s not unity anymore in that group of five permanent members of the Security Council, and those states do matter because they are some of the ones who are best-placed to be developing and investing in increasingly autonomous weapons systems.

Ariel: Okay. I wanted to also ask, unrelated, right now what you’re trying to do, what we’re trying to do, is get a ban, a preemptive ban on a weapon that doesn’t exist. What are some examples in the past of that having succeeded, as opposed to proving some humanitarian disaster as the result of a weapon?

Bonnie: Well, the main precedent for that is the preemptive ban on blinding lasers, which is a protocol to the Convention on Conventional Weapons. We did some research a few years ago into the motives behind the preemptive ban on blinding lasers, and many of them are the same. They raised concerns about the ethics of permanently blinding someone, whether it’s a combatant or a civilian. They raised concerns about the threat of an arms race. They raised concerns that there be a ban, but that it not impede peaceful development in that area. That ban has been very successful. It has not impeded the peaceful use of lasers for many civilian purposes, but it has created a stigma against and a legally-binding ruling against using blinding lasers. We think that that’s an excellent model for fully autonomous weapons, and it also appeared in the same treaty at which these fully autonomous weapons or lethal autonomous weapon systems are being discussed right now. It’s a good model to look at.

Mary: Bonnie, I really like that paper that you did on the other precedents for retaining human control over weapons systems. The notion that looking at past weapons that have been prohibited and finding that, in many instances, it’s because of the uncontrollable effects that the weapons create, from chemical weapons and biological and toxin ones to antipersonnel landmines where, once deployed, you cannot control them anymore. This is the kind of notion of being able to control the weapon system once it’s activated that has driven those previous negotiations, right?

Bonnie: Correct. There’s precedent for both a preemptive ban, but there’s also precedent for a desire to maintain human control over weapons. As Mary said, there are several treaties, chemical weapons, biological weapons and landmines, all have been banned, in large part because people in governments were concerned about losing control over the weapons system. In essence, it’s the same model here, that by launching fully autonomous weapons you’d be losing control over the use of force. I think there’s a precedent for a ban, and there’s a precedent for a preemptive ban, all of which are applicable in this situation.

Ariel: I talked to Paul Scharre a little bit earlier, and one of the things that he talked about were treaties that were developed as a result of the powers that be, recognizing that the weapon would be too big of a risk for them, and so they agreed to ban a weapon. Then, the other sort of driving force for treaties was usually civil societies and based on sort of the general public saying, “This is not okay.” What role do you see for both of those situations here?

Bonnie: There’s a multitude of reasons of why these weapons should be banned, and I think both the ones you mentioned are valid in this case. From our point of view, the main concern is a humanitarian one, and that’s civil society’s focus. We’re concerned about the risk to civilians. We’re concerned about moral issues, and those matters. That builds on past, what they call humanitarian disarmament treaties, treaties designed to protect humanity through legal norms, and, traditionally, often through bans, bans of landmines, cluster munitions and nuclear weapons.

There have been other treaties, sometimes they overlap, that have been driven more for security reasons. Countries that are concerned about other nations getting their hands on these weapons, and that they feel in the long run it’s better for no one to have them than for others to have them. Certainly, chemical weapons was an example of that. This does not mean that a treaty can’t be motivated for both reasons. That often happens, and I think both reasons are applicable here, but they just have come from slightly different trajectories.

Mary: It’s pretty amazing some of the diplomatic talks that we’ve been on on killer robots where we hear the governments debating the ethics of whether or not a specific weapon system such as fully autonomous weapons should be permitted, should be allowed. It’s rare that that happens. Normally, we are dealing with the aftermath of the consequences of proliferation and of widespread use and widespread production and stockpiling. This is an opportunity to do something in advance here, and it does kind of lead to a little bit of, I’d say, a North-South divide between the kind of military powers who have the resources at their disposal to invest in increasingly autonomous technology and try and push the boundaries, and then the vast majority of countries who are asking, “What’s the point of all of this? Where is the relevance of the UN charter which talks about general and complete disarmament as being the ultimate objective?” They ask, “Have we lost that goal here? Is the ultimate objective to create more and better and more sophisticated weapons systems, or is to end war and deal with the consequences through disarmament of warfare?”

Those are kind of really big-picture questions that are raised in this debate, and ones that we leave to those governments to make, but I think it is indicative of why there is so much interest in this particular concern, and that’s demonstrated by just the sheer number of governments who are participating in the international talks. The international talks, they’re in the setting called a Group of Governmental Experts, but this is not about a dozen guys sitting around the table in a small room. This is a big plenary meeting with more than 80 countries following, engaging, and avidly trying to figure out what to do.

Ariel: In terms of just helping people understand how the UN works, what role does a group like the Campaign to Stop Killer Robots play in the upcoming meeting? If, ultimately, the decision is made by the states and the nations, what is your role?

Mary: Our role is 24/7, all year round. These international meetings only happen a couple of times a year. This will be the second week this year. Most of our work has been this year happening in capitols and in places outside of the diplomatic meetings because that’s where you really make progress, is through the parliamentary initiatives, through reaching the high-level political leadership, through engaging the public, through talking to the media and getting an increased awareness about the challenges here and the need for action. All of those things are what makes things move inside the room with the diplomacy because the diplomats need instructions from capitols in order to really progress.

At the meeting itself, we seek to provide a diverse delegation that’s not just people from Europe and North America, but from around the world because this is a multilateral meeting. We need to ensure that we can reach out and engage with all of the delegates in the room because every country matters on this issue, and every country has questions. Can we answer all those questions? Probably not, but we can talk through them with those states, try and address the concerns, and try and be a valued partner in the deliberations that are happening. It’s the normal way of working for us here at Human Rights Watch, is to work alongside other organizations through coordinated civil society initiatives so that you don’t go to the meeting and have like 50 statements from different NGOs. You have just a few, or just one so that you can be absolutely clear and guiding where you want to see the deliberations go and the outcome that you want.

We’ll be holding side events and other efforts to engage with the delegates in different ways, as well as presenting new research and reports. I think you’ve got something coming out, Bonnie, right?

Bonnie: We’ll be releasing a new report on Martens Clause, which is a provision of international law, the Geneva conventions and other treaties that brings ethics into law. It basically has two prongs, which we’ll elaborate on in the report, but talking about that countries must comply with the principles of humanity and the dictates of public conscience, which, in short, we believe fully autonomous weapons raise concerns over both of those. We believe losing human control will violate basic principles of humanity, and that there’s the groundswell of opposition that’s growing among, not only governments, but also faith leaders, scientists, tech companies, academics, civil society, et cetera, all show that the public conscience is coming out against fully autonomous weapons and for maintaining human control over the use of force.

Ariel: To continue with this idea of the ethical issues surrounding lethal autonomous weapons, we’re joined now by Peter Asaro.

Peter: I’m Peter Asaro. I’m an Associate Professor in the School of Media Studies at the New School University in New York City, and I’m also the co-founder and vice chair of the International Committee for Robot Arms Control, which is part of the leadership steering committee of the Campaign to Stop Killer Robots, which is a coalition of NGOs that’s working at the UN to ban fully autonomous weapons.

Ariel: Could you tell us a little bit about how you got involved with this and what first gave you cause for concern?

Peter: My background is in philosophy and computer science, and I did a lot of work in artificial intelligence and in the philosophy of artificial intelligence as well as the history of science and early computing and the development of neural networks and the sort of mathematical and computational theories behind all of that. In the 1930s, ’40s, ’50s, and ’60s was my graduate work, and as part of that, I got really interested in the kind of modern or contemporary applications of both artificial intelligence and robotics, and specifically the kind of embodied forms of artificial intelligence, which are robotic in various ways, and got really interested in not just intelligence, but social interaction.

That sort of snowballed into thinking about robot ethics and what seems the most pressing issue within robot ethics was the use of violence, the use of force, and whether we would allow robots to kill people, and of course the first place that that was gonna happen would be the military. So, I’d been thinking a lot about the ethics of military robotics form the perspective of just war theory, but also a broad range of philosophical legal perspectives as well.

That got me involved with Noel Sharkey and some other people who were interested in this from a policy perspective and we launched the International Committee for Robot Arms Control back in 2009, and then in 2012, we got together with Human Rights Watch and a number of other NGOs to form the Campaign to Stop Killer Robots.

Ariel: That leads into the next question I have for you, and it’s very broad. Can you talk a little bit about what some of the ethical issues are surrounding robots and more specifically autonomous weapons in warfare?

Peter: I think of course there’s a whole host of ethical issues around robotics in general and privacy, safety, sort of the big ones, but all sorts of more complicated ones as well, job displacement, how we treat them, and the impacts on society and things like that. Within the military context, I think the issues are sort of clearer in some sense, because it’s mostly around the use autonomous systems in a lethal force.

So the primary question is should we allow autonomous weapons systems to make lethal decisions independently of human control or human judgment, however you frame that. And then sort of subsidiary to that, some would argue does the programming within a system constitute that kind of human control or decision making. From my perspective, pre-programming doesn’t really do that, and that’s because I come from a philosophical background and so we look at just war theory and you look at ethics, especially Kantian ethics, and the requirements for the morality of killing. So, killing is generally speaking immoral, but there are certain exceptions, and those are generally self-defense or collective self-defense in the case of war, but in order to justify that killing, you need reasons and justifications. And machines, and computational reasoning, at least at this stage of development, is not the type of system that has reasons. It follows rules and if certain conditions are met and a rule is applied and a result is obtained, but making a reasoned judgment about whether to use lethal force or whether to take a human life depends on a deeper understanding of reason, and I think that’s a sort of moral agency, it’s a moral decision making, and moral judgment that requires capacities that automated decision making systems just don’t have.

Maybe down the road in the future, machines will become conscious, machines will understand the meaning of life, machines will understand what it means to take a life, machines will be able to recognize human beings as humans who deserve rights that need to be respected, and systems may understand what it means to have a duty to respect the rights of others. But simply programming rules into machines doesn’t really do that. So, from a legal perspective as well, there’s no real accountability for these sorts of systems because they’re not legal agents, they’re not moral agents, you cannot sue a computer or a robot. You cannot charge them with crimes and put them in jail and things like that.

So, we have an entire legal system as well as a moral framework that assumes that humans are the responsible agents and the ones making decisions, and as soon as you start replacing that decision making with automated systems, you start to create significant problems for the regulation of these systems and for accountability and for justice. And then that leads directly to problems of safety and control, and what kinds of systems are gonna be fielded, what are gonna be the implications of that for international stability, who’s gonna have access to that, what are the implications for civilians and civilian infrastructures that might be targeted by these systems.

Ariel: I had wanted to go into some of this legality and liability stuff that you’ve brought up and you sort of given a nice overview of it as it is, but I was hoping you could expand a little bit on how this becomes a liability issue, and also … This is probably sort of an obvious question, but if you could touch a little on just how complicated it is to change the laws so that they would apply to autonomous systems as opposed to humans.

Peter: A lot of the work I’ve been doing under a grant for the Future of Life Institute, looks at liability in increasingly autonomous systems. I know within civilian domestic application, of course the big application that everybody’s looking at at the moment is the self-driving car, so you can ask this question, who’s responsible when the self-driving car creates an accident. And the way that liability law works, of course somebody somewhere is always going to wind up being responsible. The law will find a way to hold somebody responsible. The question is whether existing precedence and the ways of doing things under current legal frameworks is really just or is really the best way going forward as we have these kinds of increasingly autonomous systems.

So, in terms of holding persons responsible and liable, so under tort law, if you have an accident, then you can sue somebody. This isn’t criminal law, this is the law of torts, and under that, then you sort of receive monetary compensation for damages done. But ideally, the person, or agents, or company or what have you that causes the harm is the one that should pay. Of course, that’s not always true, and the way that liability works, does things like joint and several liability in which, even though one party only had a small hand in causing a harm, they may have lots of money, like a government or a state, or a city, or something like that, and so they may actually wind up paying far more as a share of damages than they actually contributed to a problem.

You also have situations of strict liability such that even if your agency in causing a problem was very limited, you can still be held fully responsible for the implications. There’s some interesting parallels here with the keeping of animals, which are kind of autonomous systems in a sense. They have their minds of their own, they sort of do things. On the other hand, we expect them to be well behaved and well trained, at least for domestic animals. So generally speaking, you have liability for harms caused by your dog or your horse and so forth as a domesticated animal, but you don’t have strict liability. So, you actually have to show that maybe you’ve trained your dog to attack or you’ve failed to properly train your horse or keep in a stable or what have you, whereas if you keep a tiger or something like that and it gets out and causes harm, then you’re strictly liable.

So the question is for a robot, should you be strictly liable for the robots that you create or the robots that you own? Should corporations that manufacture these systems be strictly liable for all of the accidents of self-driving cars? And while that seems like a good policy from the perspective of the public, because all the harms that are caused by these systems will be compensated, that could also stifle innovation. In the car sector, that doesn’t seem to be a problem. As it turns out, the president of Volvo said that they will accept strict liability for all of their self-driving cars. Tesla Motors has released a number of autopilot systems for their cars and more or less accepted the liability for that, although there’s only been a few accidents, so the actual jurisprudence or case law is still really emerging around that.

But those are, I think, a technology where the cars are very expensive, there’s a lot of money to be made in self-driving cars, and so the expectation of the car companies is that there will be very few accidents and that they can really afford to pay the damages for all those accidents. Now, is that gonna be true for personal robots? So, if you have a personal assistant, sort of butler robot who maybe goes on shopping errands and things like that for you, there’s a potential for them to cause significant economic damage. They’re probably not gonna be nearly as expensive as cars, hopefully, and it’s not clear that the market for them is going to be as big, and it’s not clear that companies would be able to absorb the cost of strict liability. So, there’s a question of whether that’s really the best policy for those kinds of systems.

Then there’s also questions of ability of people to modify their systems, so if you’re holding companies strictly responsible for their products, then those companies are not going to allow consumers to modify those products in any way, because that would affect their ability to control them. If you want a kind of DIY culture around autonomous systems of robotics, then you’re gonna see a lot of people modifying these systems, reprogramming these systems. So you also want, I think, a kind of strict liability around anybody who does those kinds of modifications rather than the manufacturer, and that’s to sort of break the seal and you accept all the responsibility for what happens.

And I think that’s sort of one side of it now and the military side of it, you don’t really have torts in the same way. There’s of course a couple of extreme issues around torts in war, but generally speaking, militaries do not pay monetary damages when they make mistakes. If they accidentally blow up the wrong building, they don’t pay to build a new building. That’s just considered a casualty of war and an accident, and it’s not even necessarily a war crime or anything else, because you don’t have these kind of mechanisms where you can sue an invading army for dropping a bomb in the wrong place.

The idea that liability is going to act as an accountability measure on autonomous system is just silly, I think, in warfare, because you just, you can’t sue people in war, basically. There’s a few exceptions and the governments that purchase weapons systems can sue the manufacturers, and that’s the sort of sense in which there is an ability to do that, but even most of those cases have been largely unsuccessful. Generally, those kinds of lawsuits are based on contracts and not the actual performance or damages caused by an actual system. So, you don’t really have that entire regulatory mechanism, so if you have a government that’s concerned about not harming civilians and not bombing the wrong buildings and things like that, of course, then they’re incentivized to put pressure on manufacturers to build systems that perform well, and that’s one of the sort of drivers of that technology.

But it’s a much weaker force if you think about what the engineers in a car company are thinking about in terms of safety and the kind of bottom line for their company if they make a product that causes accidents versus how that’s thought about in a defense company, where certainly they’re trying to protect civilians and ensure that systems work correctly, but they don’t have that enormously powerful economic concern about lawsuits in the future. The idea that the technology is going to be driven by similar forces, it doesn’t really apply. So that’s a big concern, I think, for the development of autonomous systems in the military sphere.

Ariel: Is there a worry or a risk that this sort of — I don’t know if it’s lack of liability, maybe it’s just whether or not we can trust the systems that are being built — but is there an increased risk of war crimes as a result of autonomous weapons, either intentionally or accidentally?

Peter: Yeah, I mean, the idea that there’s an increased risk of war crimes is kind of an interesting question, because the answer is simultaneously yes and no. What these autonomous systems actually do is diminish or remove, or put a distance between accountability of humans and their actions, or the consequences of their actions. So if you think of the autonomous system as a sort of intermediary between humans and the effects of their actions, there’s this sort of accountability gap that gets created. A system could go and do some horrendous act, like devastate a village and all the civilians in the village, and then we say, “Ah, is this a war crime?” And under international law as it stands, you’d have to prove intention, which is usually the most difficult part of war crimes tribunals, being able to actually demonstrate in court that a commander had the intention of committing some genocidal act or some war crime.

And you can build various forms of evidence for that. Now, if you send out an autonomous system, and you may not even know what that system is really gonna do and you don’t need to know exactly what it’s going to do when you give its orders, it becomes very easy to sort of distance yourself legally from what that system does in the field. Maybe you suspect it might do something terrible, and that’s what you really want, but it would be very easy then to sort of cover up your true intentions using these kinds of systems.

On the one hand, it would be much easier to commit war crimes. On the other hand, it’ll be much more difficult to prosecute or hold anybody accountable for war crimes that would be committed by autonomous weapons.

Ariel: You’ve also been producing some open letters this summer. There was one for academics calling on Google to stop work on Project Maven and … I’m sorry, you had another one… what was that one about?

Peter: The Amazon face recognition.

Ariel: Right. Right. Yeah. I was hoping you could talk a little bit about what you see as the role of academics and corporations and civil society in general in this debate about lethal autonomous weapons.

Peter: I think in terms of the debate of lethal autonomous weapons, civil society has a crucial role to play. I think in a broad range of humanitarian disarmament issues, and in the case of autonomous weapons, it’s really, it’s a technology that’s moving very quickly, and militaries are still a little bit unsure of exactly how they’re going to use it, but they’re very excited about it and they’re putting lots of research investment into new applications and trying to find new ways of using it. And I think that’s exciting from a research perspective, but it’s very concerning from a humanitarian and human rights perspective, because again, it’s not clear what kind of legal accountability will be around these systems. It’s not clear what kind of safety, control, and testing might be imposed on these systems, and it also seems quite clear that these systems are ready made for arms races and global and regional military destabilizations, where competitors are acquiring these systems and that has a potential to lead to conflict because of that destabilization itself. Then of course, the rapid proliferation.

So, in terms of civil society’s role, I think what we’ve been doing primarily is voicing of the general concern, I think, of the broad public globally and within specific countries that we’ve surveyed are largely opposed to these systems. Of course, the proponents say that’s just because they’ve seen too many sci fi movies and these things are gonna be just fine, but I don’t think that’s really the case. I think there’s some genuine fears and concerns that need to be addressed. So, we’ve also seen the involvement of a number of tech companies that are developing artificial intelligence, machine learning, robotics, and things like that.

And I think their interest and concern in this issue is twofold. We have companies like Clearpath Robotics, which is the largest robotics company in Canada, and also the largest supplier of robots to the Canadian military, whose engineers organized together to say that they do not want their systems to be used for autonomous weapons platforms, and they will not build them, but they also want to support the international campaign to ensure that governments don’t acquire their robots and then weaponize them. And they’re doing search and rescue robots and bomb disposal robots. This similar movement amongst academics and artificial intelligence and robotics who have spent really their life work developing these fundamental technologies who are then deeply concerned that the first and perhaps last application of this is going to be autonomous weapons, and the public will turn against artificial intelligence and robotics because of that, and then that these systems are genuinely scary and that we shouldn’t really be entrusting human lives or the decision to take human lives to these automated systems.

They have all kinds of great practical social applications and we should be pursuing those and just leave aside and really prohibit the use of these systems in the military context for autonomous targeting. And now I think we’re seeing more movement from the big companies, particularly this open letter that we’re a part of with Google, and their Project Maven. And Project Maven is a Pentagon project that aims at analyzing all the many thousands of hours of drone footage that the US military drones are collecting over Afghanistan and Iraq and various places where they’re operating. And to try to automate, using machine learning, to identify objects of interest, to kind of save time for human sensor analysts who have to pour through these images and then try to determine what that is.

And that in and of itself, that doesn’t seem too terrible, right? You’re just scanning through this imagery. But of course, this is really the first step to an automated targeted recognition system for drones, so if you wanted to fully automate drones, which currently require human operators to interpret the imagery to decide that this is something that should be targeted with a weapon and then to actually target and fire a weapon, that whole process is still controlled by humans. But if you wanted to automate it, the first thing you’d have to do is automate that visual analysis piece. So, Project Maven is trying to do exactly that, and to do that on a really big scale.

The other kind of issue from the perspective of a labor and research organization is that the Pentagon really has trouble, I think, attracting talent. There’s a really strong demand for artificial intelligence researchers and developers right now, because there’s so many applications and there’s so much business opportunity around it. It actually turns out the military opportunities are not nearly as lucrative as a lot of the other business applications. Google, and Amazon, and Facebook, and Microsoft can offer enormous salaries to people with PhDs in machine learning or even just masters degrees or some experience in systems development. And the Pentagon can’t compete with that on government salaries, and I think they’re even having trouble getting certain contracts with these companies. But when they get a contract with a company like Google, then they’re able to get access to really the top talent in artificial intelligence and their Cloud research groups and engineering, and also the sort of enormous capacity computationally of Google that has these massive data centers and processing capabilities.

And then you’re also getting … in some ways, Google is a company that collects data about people all over the world every day, all the time. Every Google search that you do, and there’s millions of Google searches per second or something in the world, so they have also the potential of applying the data that’s collected on the public in all these complicated ways. It’s really kind of a unique company in these respects. I think as a company that collects that kind of private data, they also have a certain obligation to society to ensure that that data isn’t used in detrimental ways, and siding with the single military in the world and using data that might be coming from users in countries where that military is operating, I think that’s deeply problematic.

We as academics kind of lined up with the engineers and researchers at Google who were already protesting Google’s involvement in this project. They were concerned about their involvement in the drone program. They were concerned about how this could be applied to autonomous weapons systems in the future. And they were just generally concerned with Google’s attempts to become a major military contractor and not just selling a simple service, like a word processor or a search, which they do anyway, but actually developing customized systems to do military operations, analyze these systems and apply their engineering skills and resources to that.

So, we really joined together as academics to support those workers. The workers passed around an open letter and then we passed around our letter, so the Google employees letter received over 4000 signatures and our letter from academics received almost 1200, a few shy. So, we really got a lot of mobilization and awareness, and then Google agreed to not renew that contract. So, they’re not dropping it, they’re gonna continue it till the end of the year, but they have said that they will not renew it in the future.

Ariel: Is there anything else that you think is important to mention?

Peter: I wrote a piece last night for a report on human dignity. So, I can just give you a little blurb about human dignity. I think the other kind of interesting ethical question around autonomous systems is this question of the right to human dignity and whether autonomous weapons or allowing robots to kill people would violate human dignity. I think some people have a very simplistic notion of human dignity, that it’s just some sort of aura or something of property that hangs around people and can be violated, but in fact I believe human dignity is a relation between people and this is a more Kantian view that human dignity means that you’re respected by others as a human. Others respect your rights, which doesn’t mean they can never violate them, but they have to have reasons and justifications that are sound in order to override your rights.

And in the case of human dignity, of course you can die in many terrible ways on a battlefield, but the question is whether the decision to kill you is justified and if it’s not, then it’s sort of an arbitrary killing. That means there’s no reasons for it, and I think if you look at the writings of the Special Rapporteur on extrajudicial summary on arbitrary executions, he’s written some interesting papers on this, which is essentially that all killing by autonomous weapons would be arbitrary in this kind of legal sense, because these systems don’t have access to reasons for killing you to know that it’s actually justified to use lethal force in a given situation.

And that’s because they’re not reasoning in the same way that we are, but it’s also because they’re not human moral agents, and it’s important in a sense that they be human, because human dignity is something that we all lose when it’s violated. So, if you look at slavery or you look at torture, it’s not simply the person who’s being tortured or enslaved who is suffering, though of course they are, but it is in fact all of us who lose a certain value of human life and human dignity by the very existence of slavery or torture, and the acceptance of that.

In a similar way, if we accept the killing of humans by machines, then we’re really diminishing the nature of human dignity and the value of human life, in a broad sense that affects everybody, and I think that’s really true, and I think we really have to think about what it means to have human control over these systems to ensure that we’re not violating the rights and dignity of people when we’re engaged in armed conflict.

Ariel: Excellent. I think that was a nice addition. Thank you so much for taking the time to do this today.

We covered a lot of ground in these interviews, and yet we still only scratched the surface of what’s going on in the debate on lethal autonomous weapons. If you want to learn more, please visit autonomousweapons.org and visit the research and reports page. On the FLI site, we’ve also addressed some of the common arguments we hear in favor of lethal autonomous weapons, and we explain why we don’t find those arguments convincing. And if you want to learn even more, of course there’s the Campaign to Stop Killer Robots website, ICRAC has a lot of useful information on their site, and Article 36 has good information, including their report on meaningful human control. And if you’re also concerned about a future with lethal autonomous weapons, please take a moment to sign the pledge. You can find links to the pledge and everything else we’ve talked about on the FLI page for this podcast.

I want to again thank Paul, Toby, Richard, Mary, Bonnie and Peter for taking the time to talk about their work with LAWS.

If you enjoyed this show, please take a moment to like it, share it and maybe even give it a good review. I’ll be back again at the end of next month discussing global AI policy. And don’t forget that Lucas Perry has a new podcast on AI value alignment, and a new episode from him will go live in the middle of the month.

[end of recorded material]

AI Alignment Podcast: AI Safety, Possible Minds, and Simulated Worlds with Roman Yampolskiy

What role does cyber security play in AI alignment and safety? What is AI completeness? What is the space of mind design and what does it tell us about AI safety? How does the possibility of machine qualia fit into this space? Can we leak proof the singularity to ensure we are able to test AGI? And what is computational complexity theory anyway?

AI Safety, Possible Minds, and Simulated Worlds is the third podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Roman Yampolskiy, a Tenured Associate Professor in the department of Computer Engineering and Computer Science at the Speed School of Engineering, University of Louisville. Dr. Yampolskiy’s main areas of interest are AI Safety, Artificial Intelligence, Behavioral Biometrics, Cybersecurity, Digital Forensics, Games, Genetic Algorithms, and Pattern Recognition. He is an author of over 100 publications including multiple journal articles and books. 

Topics discussed in this episode include:

  • Cyber security applications to AI safety
  • Key concepts in Roman’s papers and books
  • Is AI alignment solvable?
  • The control problem
  • The ethics of and detecting qualia in machine intelligence
  • Machine ethics and it’s role or lack thereof  in AI safety
  • Simulated worlds and if detecting base reality is possible
  • AI safety publicity strategy
In this interview we discuss ideas contained in upcoming and current work of Roman Yampolskiy. You can find them here: Artificial Intelligence Safety and Security and Artificial Superintelligence: A Futuristic Approach You can find more of his work at his Google Scholar and/or university page and follow him on his Facebook or Twitter.  You can hear about this work in the podcast above or read the transcript below.

Lucas: Hey everyone, welcome back to the AI Alignment Podcast Series with the Future of Life Institute. I’m Lucas Perry and today, we’ll be speaking with Dr. Roman Yampolskiy. This is the third installment in this new AI Alignment Series. If you’re interested in inverse reinforcement learning or the possibility of astronomical future suffering being brought about by advanced AI systems, make sure to check out the first two podcasts in this series.

As always, if you find this podcast interesting or useful, make sure to subscribe or follow us on your preferred listening platform. Dr. Roman Yampolskiy is a tenured associate professor in the Department of Computer Science and Engineering at the Speed School of Engineering at the University of Louisville. He is the founding and current director of the Cybersecurity Lab and an author of many books including Artificial Superintelligence: A Futuristic Approach.

Dr. Yampolskiy’s main areas of interest are in AI safety, artificial intelligence, behavioral biometrics, cybersecurity, digital forensics, games, genetic algorithms and pattern recognition. Today, we cover key concepts in his papers and books surrounding AI safety and artificial intelligence superintelligence and AGI, his approach to AI alignment, how AI security fits into all this. We also explore our audience-submitted questions. This was a very enjoyable conversation and I hope you find it valuable. With that, I give you Dr. Roman Yampolskiy.

Thanks so much for coming on the podcast, Roman. It’s really a pleasure to have you here.

Roman: It’s my pleasure.

Lucas: I guess let’s jump into this. You can give us a little bit more information about your background, what you’re focusing on. Take us a little bit through the evolution of Roman Yampolskiy and the computer science and AI field.

Roman: Sure. I got my PhD in Computer Science and Engineering. My dissertation work was on behavioral biometrics. Typically, that’s applied to profiling human behavior, but I took it to the next level looking at nonhuman entities, bots, artificially intelligent systems trying to see if we can apply same techniques, same tools to detect bots, to prevent bots, to separate natural human behavior from artificial behaviors.

From there, I try to figure out, “Well, what’s the next step? As those artificial intelligence systems more capable, can we keep up? Can we still enforce some security on them?” That naturally led me to looking at much more capable systems and the whole issues with AGI and superintelligence.

Lucas: Okay. In terms of applying biometrics to AI systems or software or computers in general, what does that look like and what is the end goal there? What are the metrics of the computer that you’re measuring and to what end are they used and what information can they give you?

Roman: The good example I can give you is from my dissertation work again. I was very interested with poker at the time. The poker rooms online were still legal in US and completely infested with bots. I had a few running myself. I knew about the problem and I was trying to figure out ways to automatically detect that behavior. Figure out which bot is playing and prevent them from participating and draining resources. That’s one example where you just have some sort of computational resource and you want to prevent spam bots or anything like that from stealing them.

Lucas: Okay, this is cool. Before you’ve arrived at this AGI and superintelligence stuff, could you explain a little bit more about what you’ve been up to? It seems like you’ve done a lot in computer security. Could you unpack a little bit about that?

Roman: All right. I was doing a lot of very standard work relating to pattern recognition, neural networks, just what most people do in terms of work on AI recognizing digits and handwriting and things of that nature. I did a lot of work in biometrics, so recognizing not just different behaviors but face recognition, fingerprint recognition, any type of forensic analysis.

I do run Cybersecurity Lab here at the University of Louisville. My students typically work on more well recognized sub domains of security. With them, we did a lot of work in all those domains, forensics, cryptography, security.

Lucas: Okay. Do you feel that all the security research, how much of it do you think is important or critical to or feeds into ASI and AGI research? How much of it right now is actually applicable or is making interesting discoveries, which can inform ASI and AGI thinking?

Roman: I think it’s fundamental. That’s what I get most of my tools and ideas for working with intelligent systems. Basically, everything we learned in security is now applicable. This is just a different type of cyber infrastructure. We learned to defend computers, networks. Now, we are trying to defend intelligent systems both from insider threats and outside from the systems themselves. That’s a novel angle, but pretty much everything I did before is now directly applicable. So many people working in AI safety approach it from other disciplines, philosophy, economics, political science. A lot of them don’t have the tools to see it as a computer science problem.

Lucas: The security aspect of it certainly make sense. You’ve written on utility function security. If we’re to make value aligned systems, then it’s going to be important that the right sorts of people have control over them and that their preferences and dispositions and the systems, again, utility function is secure is very important. A system in the end I guess isn’t really safe or robust or value aligned if it’s extremely influenced by anyone.

Roman: Right. If someone can just disable your safety mechanism, do you really have a safe system? That completely defeats everything you did. You release a well-aligned, friendly system and then somebody flips a few bits and you got the exact opposite.

Lucas: Right. Given this research focus that you have in security and how it feeds into ASI and AGI thinking and research and AI alignment efforts, how would you just generally summarize your approach to AI alignment and safety?

Roman: There is not a general final conclusion I can give you. It’s still work in progress. I’m still trying to understand all the types of problems we are likely to face. I’m still trying to understand this problem as even solvable to begin with. Can we actually control more intelligent systems? I always look at it from engineering computer science point of view much less from philosophy ethics point of view.

Lucas: Whether or not this problem is in principle solvable, that has a lot to do with fundamental principles and ideas and facts about minds in general and what is possible of minds. Can you unpack a little bit more about what sorts of information we need or what we need to think about more going forward to know what it means whether or not this problem is solvable in principle, how we can figure that up as we continue forward?

Roman: There is multiple ways you can show that it’s solvable. The ideal situation is where you can produce some sort of a mathematical proof. That’s probably the hardest way to do it because it’s such a generic problem. It applies to all domains. It has to be still working under self-improvement and modification. It has to still work after learning of additional information and it has to be reliable against malevolent design, so purposeful modifications. It seems like it’s probably the hardest problem ever to be given to them. Mathematics community are willing to take it on.

You can also look at examples just from experimental situations both with artificial systems. Are we good at controlling existing AIs? Can we make them safe? Can we make software safe in general? Also, natural systems. Are we any good at creating safe humans? Are we good at controlling people? Now, it seems like after millennia of efforts coming up with legal framework, ethical framework, religions, all sorts of ways of controlling people, we are pretty much failing at creating safe humans.

Lucas: I guess in the end, that might come down to fundamental issues in human hardware and software. Like the reproduction of human beings through sex and the way that genetics functions just creates a ton of variance in each person, which each person has different dispositions and preferences and other things. Then also the way that I guess software is run and shared across culture and people. Creates more fundamental issues that we might not have in software and machines because they work differently.

Are there existence proofs I guess with AI where AI is superintelligent in a narrow domain or at least above human intelligence in a narrow domain and we have control over such narrow systems? Would it be potentially generalizable as you sort of aggregate more and more AI systems, which are superintelligent in narrow domains that as you aggregate that or create an AGI, which sort of has meta learning, we would be able to have control over it given these existence proofs in narrow domains?

Roman: There are certainly such examples in narrow domains. If we’re creating, for example, a system to play chess. We can have a single number measuring it’s performance. We can control whatever is getting better or worse. That’s quite possible and is very limited linear domain. The problem is as complexity increases, you go from this n-body problem equals one to n-body equals infinity, and that’s very hard to solve both computationally and in terms of just understanding what in that hyperspace of possibilities is a desirable outcome.

It’s not just gluing together a few narrow AIs like, “Okay, I have a chess playing program. I have a go playing program.” If I put them all in the same PC, do I now have general intelligence capable of moving knowledge across domains? Not exactly. Whatever safety you can prove for limited systems, not necessarily will transferred to a more complex system, which integrates the components.

Very frequently, then you add two safe systems, the merged system has back doors, has problems. Same with adding additional safety mechanisms. A lot of times, you will install a patch for software to increase security and the patch itself has additional loopholes.

Lucas: Right. It’s not necessarily the case that in the end, AGI is actually just going to be sort of like an aggregation of a lot of AI systems, which are superintelligent in narrow domains. Rather, it potentially will be something more like an agent, which has very strong meta learning. So, learning about learning and learning how to learn and just learning in general. Such that all the sort of process is in things that it learns or deeply integrated at a lower level and they’re sort of like a higher level thinking that is able to execute on these things that they learned. Is that so?

Roman: That makes a lot of sense.

Lucas: Okay. Moving forward here, it would be nice if we could go ahead and explore a little bit of the key concepts in your books and papers and maybe get into some discussions there. I don’t want to spend a lot of time talking about each of the terms and having you define them as people can read your book, Artificial Superintelligence: A Futuristic Approach. They can also check out your papers and you’ve talked about these in other places. I think it will be helpful for giving some background and terms that people might not exactly be exposed to.

Roman: Sure.

Lucas: Moving forward, what can you tell us about what AI completeness is?

Roman: It’s a somewhat fuzzy term kind of like Turing test. It’s not very precisely defined, but I think it’s very useful. It seems that there are certain problems in artificial intelligence in general which require you to pretty much have general intelligence to solve them. If you are capable of solving one of them, then by definition, we can reduce other problems to that one and solve all problems in AI. In my papers, I talk about passing Turing test as being the first such problem. If you can pass unrestricted version of a Turing test, you can pretty much do anything.

Lucas: Right. I think people have some confusions here about what intelligence is in the kinds of minds that can solve Turing tests completely and the architecture that they have and whether that architecture means they’re exactly intelligent. I guess some people have this kind of intuition or idea that you could have a sort of system that had meta learning and learning and was able to sort of think as a human does in order to execute a Turing test.

Then potentially, other people have an idea and this may be misguided where a sort of sufficiently complicated tree search or Google engine on the computer would be able to pass a Turing test and that seems potentially kind of stupid. Is the latter idea a myth? Or if not, how is it just as intelligent as the former?

Roman: To pass an unrestricted version of a Turing test, against someone who actually understands how AI works is not trivial. You can do it with just lookup tables and decision trees. I can give you an infinite number of completely novel situations where you have to be intelligent to extrapolate to figure out what’s going on. I think theoretically, you can think of an infinite lookup table which has every conceivable string for every conceivable previous sequence of questions, but in reality, it just makes no sense.

Lucas: Right. They’re going to be sort of like cognitive features and logical processes and things like inferences and extrapolation and logical tools that humans use that almost must necessarily come along for the ride in order to fully pass a Turing test.

Roman: Right. To fully pass it, you have to be exactly the same in your behavior as a human. Not only you have to be as smart, you also have to be as stupid. You have to repeat all the mistakes, all the limitations in terms of humanity, in terms of your ability to compute, in terms of your cognitive biases. A system has to be so smart that it has a perfect model of an average human and can fake that level of performance.

Lucas: It seems like in order to pass a Turing test, the system would either have to be an emulation of a person and therefore almost essentially be a person just on different substrate or would have to be superintelligent in order to run an emulation of a person or a simulation of a person.

Roman: It has to have a perfect understanding of an average human. It goes together with value alignment. You have to understand what a human would prefer or say or do in every situation and that does require you to understand humanity.

Lucas: Would that function successfully at a higher level of general heuristics about what an average person might do or does it require a perfect emulation or simulation of a person in order to fully understand what a person would do in such an instance?

Roman: I don’t know if it has to be perfect. I think there are certain things we can bypass and just going to read books about what a person would do in that situation, but you do have to have a model complete enough to produce good results in novel situations. It’s not enough to know, OK, most people would prefer ice cream over getting a beating, something like that. You have to figure out what to do in a completely novel set up where you can just look it up on Google.

Lucas: Moving on from AI completeness, what can you tell us about the space of mind designs and the human mental model and how this fits into AGI and ASI and why it’s important?

Roman: A lot of this work was started by Yudkowsky and other people. The idea is just to understand how infinite that hyperspace is. You can have completely different sets of goals and desires from systems which are very capable optimizers. They may be more capable than an average human or best human, but what they want could be completely arbitrary. You can’t make assumptions along the lines of, “Well, any system smart enough would be very nice and beneficial to us.” That’s just a mistake. If you randomly pick a mind from that infinite universe, you’ll end up with something completely weird. Most likely incompatible with human preferences.

Lucas: Right. This is just sort of, I guess, another way of explaining the orthogonality thesis as described by Nick Bostrom?

Roman: Exactly. Very good connection, but it gives you a visual representation. I have some nice figures where you can get a feel for it. You start with, “Okay, we have human minds, a little bit of animals, you have aliens in the distance,” but then you still keep going and going in some infinite set of mathematical possibilities.

Lucas: In this discussion of the space of all possible minds, it’s a discussion about intelligence where intelligence is sort of understood as the ability to change and understand the world and also the preferences and values which are carried along in such minds however random and arbitrary they are from the space of all possible mind design.

One thing which is potentially very important in my view is the connection of the space of all possible hedonic tones within mind space, so the space of all possible experience and how that maps onto the space of all possible minds. Not to say that there’s duality going on there, but it seems very crucial and essential to this project to also understand the sorts of experiences of joy and suffering that might come along for each mind within the space of all possible minds.

Is there a way of sort of thinking about this more and formalizing it more such as you do or does that require some more really foundational discoveries and improvements in the philosophy of mind or the science of mind and consciousness?

Roman: I look at this problem and I have some papers looking at those. One looks at just generation of all possible minds. Sequentially, you can represent each possible software program as an integer and brute force them. It will take infinite amount of time, but you’ll get to every one of them eventually.

Another recent paper looks at how we can actually detect qualia in natural and artificial agents. While it’s impossible for me to experience the world as someone else, I think I was able to come up with a way to detect whatever you have experiences or not. The idea is to present you with the illusions, kind of visual illusions and based on the type of body you have, the type of sensors you have, you might have experiences which match with mine. If they are not, then I can say really anything about you. You could be conscious and experiencing qualia or maybe not. I have no idea.

In a set of such tests on multiple illusions, you happen to experience exactly the same side effects from the illusion. This test drew multiple-choice questions and you can get any level of accuracy you want with just additional tests. Then I have no choice but to assume that you have exactly same qualia in their situation. So, at least I know you do have experiences of that type.

If it’s taking it to what you suggested pleasure or pain, we can figure out is there suffering going on, is there pleasure happening, but this is very new. We need a lot more people to start doing psychological experiments with that.

The good news is from existing literature, I found a number of experiments where a neutral network designed for something completely unrelated still experienced similar side effect as a natural model. That’s because the two models represent the same mathematical structure.

Lucas: Sorry. The idea here is that by observing effects on the system that if those effects are also correlated or seen in human subjects that this is potentially some indication that the qualia that is correlated with those effects in people is also potentially experienced or seen in the machine?

Roman: Kind of. Yeah. So, when I show you a new cool optical illusion. You experienced something outside of just the values of bits in that illusion. Maybe you see light coming out of it. Or maybe you see rotations. Maybe you see something else.

Lucas: I see a triangle that isn’t there.

Roman: Exactly. If a machine reports exactly the same experience without previous knowledge obviously, then just Google what a human would see. How else would you explain that knowledge, right?

Lucas: Yeah. I guess I’m not sure here. I probably need to think about it more actually, but this does seem like a very important approach in place to move forward. The person in me who’s concerned about thinking about ethics looks back on the history of ethics and thinks about how human beings are good at optimizing the world in ways in which it produces something of value to them but in optimizing for that thing, they produce huge amounts of suffering. We’ve done this through subjugation of women and through slavery and through factory farming of animals currently and previously.

After each of these periods, of these morally abhorrent behaviors, it seems we have an awakening and we’re like, “Oh, yeah, that was really bad. We shouldn’t have done that.” I guess just moving forward here with machine intelligence, it’s not clear that this will be the case or it is possible that it could be the case, but it may. Potentially sort of the next one of these moral catastrophes is if we sort of ignore this research into the possible hedonic states of machines and just brush it away as being dumb philosophical stuff that we potentially could produce an enormous amount of suffering in machine intelligence and just sort of override that and create another ethical catastrophe.

Roman: Right. I think that makes a lot of sense. I think qualia, a side effect of certain complex computations. You can’t avoid producing them if you’re doing this type of thinking, computing. We have to be careful once we get to that level of not having very painful side effects.

Lucas: Is there any possibility here of trying to isolate the neural architectural correlates of consciousness in human brains and then physically or digitally instantiating that in machines and then creating a sort of digital or physical corpus callosum between the mind of a person and such a digital or physical instantiation of some neural correlate of something in the machine in order to see if an integration of those two systems creates a change in qualia for the person? Such that the person could sort of almost first-person confirm that when it connects up to this thing that its subjective experience changes and therefore maybe we have some more reason to believe that this thing independent of the person, when they disconnect, has some sort of qualia to it.

Roman: That’s very interesting type of experiment I think. I think something like this has been done with Siamese twins conjoined with brain tissue. You can start looking at those to begin with.

Lucas: Cool. Moving on from the space of mind designs and human mental models, let’s go ahead and then talk about the singularity paradox. This is something that you cover quite a bit in your book. What can you tell us about the singularity paradox and what you think the best solutions are to it?

Roman: It’s just a name for this idea that you have a superintelligent system, very capable optimizer, but it has no common sense as we human perceive it. It’s just kind of this autistic savant capable of making huge changes in the world but a four-year-old would have more common sense in terms of disambiguation of human language orders. Just kind of understanding the desirable states of the world.

Lucas: This is sort of the fundamental problem of AI alignment. The sort of assumption about the kind of mind AGI or ASI will be, the sort of autistic savant sort of intelligence, what that is … This is what Dylan Hadfield-Menell brought up on our first podcast for the AI Alignment Series is that for this case of this autistic savant that most people have in mind, a perfectly rational Bayesian optimizing agent. Is that sort of the case? Is that the sort of mind that we have in mind when we’re thinking of this autistic savant that just blows over things we care about because it’s just optimizing too hard for one thing and Goodhardt’s law starts to come into effect?

Roman: Yes, in a way. I always try to find most simple examples so we can understand better in the real world. Then you have people with extremely high level of intelligence. The concerns they have, the issues they find interesting are very different from your average person. If you watch something like Big Bang Show with Sheldon, that’s like a good to funny example of this on a very small scale. There is maybe 30 IQ point difference, but what if it’s 300 points?

Lucas: Right. Given the sort of problem, what are your conclusions and best ideas or best practices for working on this? Working on this is just sort of working on the AI alignment problem I suppose.

Roman: AI alignment is just a new set of words to say we want the safe and secure system, which kind of does what we designed it to do. It doesn’t do anything dangerous. It doesn’t do something we disagree with. It’s well aligned with our intention. By itself, the term adds nothing new. The hard problem is, “Well, how do we do it?”

I think it’s fair to say that today, as of right now, no one in the world has a working safety mechanism capable of controlling intelligent behavior and scaling to a new level of intelligence. I think even worse is that no one has a prototype for such a system.

Lucas: One thing that we can do here is we can sort of work on AI safety and we can think about law, policy and governance to try and avoid an arms race in AGI or ASI. Then there are also important ethical questions which need to be worked on before AGI some of which including kind of more short-term things, universal basic income and bias and discrimination in algorithmic systems. How AI will impact the workforce and other things and potentially some bigger ethical questions we might have to solve after AGI if we can pull the brakes.

In terms of the technical stuff, one important path here is thinking about and solving the confinement problem, the method by which we are able to create an AGI or ASI and air gap it and make it so that it is confined and contained to be tested in some sort of environment to see if it’s safe. What are your views on that and what do you view as a potential solution to the confinement problem?

Roman: That’s obviously a very useful tool to have, to test, to debug, to experiment with an AI system while it’s limited in its communication ability. It cannot perform social engineering attacks against the designer or anyone else. It’s not the final solution if you will if a system can still escape from such confinement, but it’s definitely useful to be able to do experiments on evolving learning AI.

Can I limit access to the Internet? Can I limit access to knowledge, encyclopedia articles? Can I limit output in terms of just text, no audio, no video? Can I do just a binary yes or no? All of it is extremely useful. We have special air gap systems for studying computer viruses, so to understand how they work, how they communicate versus just taking it to the next level of malevolent software.

Lucas: Right. There’s sort of this, I guess, general view and I think that Eliezer has participated in some of these black boxing experiments where you pretend as if you are the ASI and you’re trying to get out of the box and you practice with other people to see if you can get out of the box. Out of discussions and thinking on this, it seems that some people thought that it’s almost impossible to confine these systems. Do you think that, that’s misguided or what are your views on that?

Roman: I agree that long-term, you absolutely cannot confine a more intelligent system. I think short-term while it’s still developing and learning, it’s a useful tool to have. The experiments Eliezer did, very novel at the time, but I wish he meet public all the information to make them truly scientific experiments where people can reproduce them properly, learn from them. Simply saying that this guy who now works with me let me out, it’s not the optimal way to do it.

Lucas: Right. I guess the concern there is with confinement experiments is that explaining the way in which it gets out is potentially an information hazard.

Roman: Yeah. People tend to call a lot of things informational hazards. Those things certainly exist. If you have source code for AGI, I strongly recommend you don’t make it public, but we’ve been calling a lot of things informational hazard I think.

The best example is Roko’s basilisk where essentially it was a new way to introduce Christianity. If I tell you about Jesus and you don’t follow him, now you’re going to hell. If I didn’t tell you about Jesus, you’d be much better off. Why did you tell me? Deleting it just makes it grow bigger and it’s like Streisand effect, right? You promoting this while you trying to suppress it. I think you have to be very careful in calling something an informational hazard, because you’re diluting the label by doing that.

Lucas: Here’s something I think we can potentially get into the weeds on and we may disagree about and have some different views on. Would you like to just go ahead and unpack your belief? First of all, go ahead and explain what it is and then explain your belief about why machine ethics in the end is the wrong approach or a wrong instrument in AI alignment.

Roman: The way it was always done in philosophy typically, everyone tried to publish a paper suggesting, “Okay, this is a set of ethics we need to follow.” Maybe it’s ethics based on Christianity or Judaism. Maybe it’s utilitarianism, whatever it is. There was never any actual solution, anything was proposed which could be implemented as a way to get everyone on board and agree with it. It was really just a competition for like, “Okay, I can come up with a new ethical set of constraints or rules or suggestions.”

We know philosophers have been trying to resolve it for millennia. They failed miserably. Why somehow moving it from humans to machines will make it easier problem to solve where a single machine is a lot more powerful and can do a lot more with this is not obvious to me. I think we’re unlikely to succeed by doing that. The theories are contradictory, ill-defined, they compete. It doesn’t seem like it’s going to get us anywhere.

Lucas: To continue unpacking your view a bit more, instead of machine ethics where we can understand machine ethics as the instantiation of normative and meta-ethical principles and reasoning and machine systems to sort of make them moral agents and moral reasoners, your view is that instead of using that, we should use safety engineering. Would you like to just unpack what that is?

Roman: To return to the definition you proposed. For every ethical system, there are edge cases which backfire tremendously. You can have an AI which is a meta-ethical decider and it figures out, “Okay, the best way to avoid human suffering is do not have any humans around.” You can defend it from philosophical point of view, right? It makes sense, but is that a solution we would accept if a much smarter system came up with it?

Lucas: No, but that’s just value misalignment I think. I don’t think that there are any sort of like … There are, in principle, possible moral systems where you say suffering is so bad that we shouldn’t risk any of it at all ever, therefore life shouldn’t exist.

Roman: Right, but then you make AI the moral agent. That means it’s making moral decisions. It’s not just copying what humans decided even if we can somehow figure out what the average is, it’s making its own novel decisions using its superintelligence. It’s very likely it will come up with something none of us ever considered. The question is, will we like it?

Lucas: Right. I guess just for me here, I understand why AI safety engineering and technical alignment efforts are so very important and intrinsic. I think that it really constitutes a lot of the AI alignment problem. I think that given that the universe has billions and billions and billions of years left to live, that the instantiation of machine ethics in AGI and ASI is… you can’t hold off on it and it must be done.

You can’t just have an autistic savant superspecies on the planet that you just never imbue with any sort of ethical epistemology or meta-ethics because you’re afraid of what might happen. You might want to do that extremely slowly and extremely carefully, but it seems like machine ethics is ultimately an inevitability. If you start to get edge cases that the human beings really don’t like, then potentially you just went wrong somewhere in cultivating and creating its moral epistemology.

Roman: I agree with doing it very slowly and carefully. That seems like a good idea in general, but again, just projecting to long-term possibilities. I’m not optimistic that the result will be beneficial.

Lucas: Okay. What is there left to it? If we think of the three cornerstones of AI alignment as being law, policy, governance, then we have ethics on one corner and then we have technical AI alignment on the other corner. We have these three corners.

If we have say AGI or ASI around 2050, which I believe is something a lot of researchers give a 50% probability to, then imagine we simply solve technical AI alignment and we solved the law, policy and governance coordination stuff so that we don’t end up having an arms race and we mess up on technical alignment. Or someone uses some singleton ASI to malevolently control everyone else.

Then we still have the ethical issues in the end. Even if we have a perfectly corrigible and docile intelligence, which is sort of tuned to the right people and sort of just takes the right orders. Then whatever that ASI does, it’s still going to be a manifestation, an embodiment of the ethics of the people who tell it what to do.

There’s still going to be billions and billions of years left in the universe. William MacAskill discusses this. Is that sort of after we’ve solved the technical alignment issues and the legal and political and coordination issues, then we’re going to need a period of long deliberation where we actually have to make concrete decisions about moral epistemology and meta-ethics and try and do it in really a formalized and rigorous way and potentially take thousands of years to figure it out.

Roman: I’m criticizing this and that makes it sound like I have a solution, which is something else and I don’t. I don’t have a solution whatsoever. I just feel it’s important to point out problems with each specific approach so we can avoid problems of over committing to it.

You mentioned a few things. You mentioned getting information from the right people. That seems like that’s going to create some problems right there. Not sure who the right people are. You mentioned spending thousands of years deciding what we want to do with this superintelligent system. I don’t know if we have that much time given all the other existential risks, given the chance of malevolent superintelligence being released by rogue agents much sooner. Again, it may be the best we got, but it seems like there are some issues we have to look at.

Lucas: Yeah, for sure. Ethics has traditionally been very messy and difficult. I think a lot of people are confused about the subject. Based on my conversation with Dylan Hadfield-Menell, when we’re discussing inverse reinforcement learning and other things that he was working on, his sort of view was a view of AI alignment and value alignment where inverse reinforcement learning and other preference learning techniques are sort of used to create a natural evolution of human values and preferences in ethics, which sort of exists in an ecosystem of AI systems which are all, I guess, in conversation so that it could, more so, naturally evolve.

Roman: Natural evolution is a brutal process. It really has no humanity to it. It exterminates most species. I don’t know if that’s the approach we want to simulate.

Lucas: Not an evolution of ideas?

Roman: Again, if those ideas are actually implemented and applied to all of humanity that has a very different impact than if it’s just philosophers debating with no impact.

Lucas: In the end, it seems like a very difficult end frontier to sort of think about and move forward on. Figuring out what we want and what we should do with a plurality of values and preferences. Whether or not we should take a view of moral realism or moral relativism or anti-realism about ethics and morality. Those seem like extremely consequential views or positions to take when determining the fate of the cosmic endowment.

Roman: I agree completely on how difficult the problem is.

Lucas: Moving on from machine ethics, you wrote a  paper on leak proofing the singularity. Would you like to go ahead and unpack a little bit about what you’re doing in the paper and how that ties into all of this?

Roman: That’s just AI boxing. That was the response to David Chalmers’ paper and he talks about AI boxing as leak proofing, so that’s the title we used, but it’s just a formalization of the whole process. Formalization of the communication channel, what goes in, what goes out. It’s a pretty good paper on it. Again, it relies in this approach of using tools from cyber security to formalize the whole process.

For a long time, experts in cyber security attempted to constrain regular software, not intelligent software from communicating with our programs and outside world and operating system. We’re looking at how that was done, what different classifications they used for site channels and so on.

Lucas: One thing that you also touch on, would you like to go ahead and unpack like wireheading addiction and mental illness in general in machine systems and AI?

Roman: It seems like there is a lot of mental disorders, people experience. The only example of general intelligence we have. More and more, we see similar problems show up in artificial systems, which try to emulate this type of intelligence. It’s not surprising and I think it’s good that we have this body of knowledge from psychology which we can now use to predict likely problems and maybe come up with some solutions for them.

Wireheading is essentially this idea of agent not doing any useful work but just stealing their work channel. If you think about having kids and there is a cookie jar and they get rewarded every time they clean the room or something like that with a cookie, well, they essentially can just find the cookie jar and get direct access to their work channel, right? They’re kids, so they’re unlikely to cause much harm, but if a system is more capable, it realizes you as a human control the cookie jar, well now, it has incentive to control you.

Lucas: Right. There are also these examples with rats and mice that you might be able to discuss a little bit more.

Roman: The classic experiments on that just created through surgery, electrode implants in a brain of some simple animals. Every time you provided an electrical shock to that area, the animals experience the maximum pleasure like orgasm you don’t get tired of. They bypass getting food, having sex, playing with toys. They just sat there pressing the button. If you made it where they have to walk on electrocuted fence to get to the button, it wasn’t a problem, they would do that. It completely messes with usefulness of an agent.

Lucas: Right. I guess just in terms of touching on the differences and the implications of ethics here is that one with sort of consequentialist views, which was sort of very impartial and on speciesists can potentially view wireheading as ethical or the end goal. Whereas other people view a wireheading as basically abhorrent and akin to something terrible that you would never want to happen. There’s also again, I think, a very interesting ethical tension there.

Roman: It goes, I think, to the whole idea of simulated reality and virtual world. Do you care if you’re only succeeding in a made-up world? Would that make you happy enough or do you have to actually impact reality? That could be part of resolving our differences about values and ethics. If every single person can be in their own simulated universe where everything goes according to their wishes, is that a solution to getting us all to agree? You know it’s a fake universe, but at least you’re the king in it.

Lucas: I guess that also touches on this question of the duality that human beings have created between what is fake and real. In what sense is something really fake if it’s not just the base reality? Is there really fundamental value in the thing being the base reality and do we even live in the base reality? How does cosmology or ideas that Max Tegmark explores about the multiverse sort of even impact that? How will that impact our meta-ethics and decision-making about the moral worth of wireheading and simulated worlds?

Roman: Absolutely. I have a paper on something I call designer metry, which is measuring natural versus artificial. The big question of course is can we tell if you are living in a simulated reality? Can it be measured scientifically? Or was it just a philosophical idea? It seems like there are certain ways to identify signals from the engineer if it’s done on purpose, but in general case, you can never tell whatever something is a deep fake or a real input.

Lucas: I’d like to discuss that a little bit more with you, but just to backup really quick to finish talking on about psychology and AI. It seems like this has been something that is really growing in the AI community and it’s not something that I really know much about at all. My general understanding is as AI systems become more and more complex, it’s going to be much more difficult to diagnose and understand the specific pathways and architectures, which are leading to mental illness.

Therefore, general diagnosable tools which observe and understand higher level phenomena or behaviors that systems exist that we’ve developed in psychology would be helpful or implementable here. Is that sort of the case and the use case of psychology here is really just diagnose mental illnesses or does it also has a role in developing positive psychology and well-being in machine systems?

Roman: I think it’s more of a first case. If you have a black box AI, just a huge, very deep neural network, you can just look at the wiring and weights and figure out why it’s producing the results you’re seeing. Whereas you can do high-level experiments, maybe even conversation with the system to give you an idea of how it’s misfiring what the problem is.

Lucas: Eventually, if we begin exploring the computational structure of different hedonic tones and that becomes more formalized as a science, then I don’t know, maybe potentially, there would be more of a role for psychologists in discussing the well-being part rather than the computational mental illness part.

Roman: It is a very new concept. It’s been mentioned a lot in science fiction, but as a scientific concept, it’s very new. I think there is only one or two papers on it directly. I think there is so much potential to exploring more on connections with neuroscience. I’m actually quite excited about it.

Lucas: That’s exciting. Are we living in a simulated world? What does it mean to be able to gather evidence about whether or not we’re living in a simulation? What would such evidence look like? Why may we or may not ever be able to tell whether or not we are in a simulation?

Roman: In general case, if there is not an intent to let you know that it’s a simulated world, you would never be able to tell. Absolutely anything can actually be part of natural base system. You don’t know what it’s like if you are Mario playing in an 8-bit world. You have no idea that it’s low resolution. You’re just part of that universe. You assume the base is the same.

There are situations where engineers leave trademarks, watermarks, helpful messages in a system to let you know what’s going on, but that’s just giving you the answer. I think in general case, you can never know, but from statistical arguments, there’s … Nick Bostrom presents a very compelling statistical arguments. I do the same for biological systems in one of my papers.

Roman: It seems more likely that we are not the base just because every single intelligent civilization will produce so many derived civilizations from it. From space exploration, from creating biological robots capable of undergoing evolutionary process. It would be almost a miracle if out of thousands and thousands of potential newly designed organisms, newly evolved ones, we were like the first one.

Lucas: I think that, that sort of evolutionary process presumes that the utility function of the optimization process, which is spreading into the universe, is undergoing an evolutionary process where it’s changing. Whereas the security and brittleness and stability of that optimization process might be very fixed. It might be that all future and possible super advanced civilizations do not converge on creating ancestor simulations.

Roman: It’s possible, but it feels like a bit less likely. I think they’ll still try to grab the resources and the systems may be fixed in certain values, but they still would be adopting to the local environment. We just see it with different human populations, right? We’re essentially identical, but we developed very different cultures, religions, food preferences based on the local available resources.

Lucas: I don’t know. I feel like I could imagine like a civilization, a very advanced one coming down on some sort of hedonic consequentialism where the view is that you just want to create as many beautiful experiences as possible. Therefore, there wouldn’t be any room for simulating evolution on Earth and all the suffering and kind of horrible things we have to go through.

Roman: But you’re looking at it from inside the simulation. You don’t know what the reasons are on the outside, so this is like a video game or going to the gym. Why would anyone be killed in a video game or suffer tremendously, lifting heavy weights in a gym, right? It’s only fun when you understand external reasons for it.

Lucas: I guess just two things here. I just have general questions on. If there is a multiverse at one or another level, would it then also be the case that the infinity of simulated universes would be a larger fraction of the infinity of the multiverse than the worlds which were not simulated universes?

Roman: This is probably above my pay grade. I think Max is someone who can give you a better answer in that. Comparing degrees of infinities is hard.

Lucas: Okay. Cool. It is not something I really understand either. Then I guess the other thing is I guess just in general, it seems queer to me that human beings are in a world and that we look at our computer systems and then we extrapolate what if these computer systems were implemented at a more base level. It seems like we’re trapped in a context where all that we have to extrapolate about the causes and conditions of our universe are the most fundamental things that we can observe from within our own universe.

It seems like settling on the idea of, “Okay, we’re probably in a simulation,” just seems kind of like we’re gluing to and finding a cosmogenesis hope in one of the only few things that we can, just given that we live in a universe where there are computers. Does that make sense?

Roman: It does. Again, from inside the simulation, you are very limited in understanding the big picture. Then so much would be easier to understand if we had external knowledge, but it’s just not the option we have so far. We learn by pretending to be the engineer in question and now we design virtual worlds. We design intelligent beings and the options we have is the best clue we have about the options available to whoever does it in the external level.

Lucas: Almost as if Mario got to the end of the level and got to the castle. Then because you got to the castle the next level or world started, he was like maybe outside of this context there’s just a really, really big castle or something that’s making lower levels of castles exist.

Roman: Right. I agree with that, but I think we have in common this mathematical language. I think that’s still universal. Just by studying mathematics and possible structures and proving things, we can learn about what’s possible and impossible.

Lucas: Right. I mean there’s just really foundational and fundamental question about the metaphysical realism or anti-realism of mathematics. If there is a multiverse or like a meta multiverse or like a meta-meta-meta-multiverse levels …

Roman: Only three levels.

Lucas: I guess just the implications of a mathematical realism or Platonism or sort of anti-realism at these levels would have really big implications.

Roman: Absolutely, but at this point, I think it’s just fun to think about those possibilities and what they imply for what we’re doing, what we’re hoping to do, what we can do. I don’t think it’s a waste of time to consider those things.

Lucas: Just generally, this is just something I haven’t really been updated on. Is this rule about only in three levels of regression, is that just sort of a general principle or role kind of like Occam’s razor that people like to stick by? Or is there any more…?

Roman: No. I think it’s something Yudkowsky said and it’s cute and kind of meme like.

Lucas: Okay. So it’s not like serious epistemology?

Roman: I don’t know how well proven that is. I think he spoke about levels of recursion initially. I think it’s more of a meme.

Lucas: Okay. All right.

Roman: I might be wrong in that. I know a lot about memes, less about science.

Lucas: Me too. Cool. Given all this and everything we’ve discussed here about AI alignment and superintelligence, what are your biggest open questions right now? What are you most uncertain about? What are you most looking for key answers on?

Roman: The fundamental question of AI safety, is it solvable? Is control problem solvable? I have not seen a paper where someone gives mathematical proof or even a rigorous argument. I see in some blog posts arguing, “Okay, we can predict what the chess machine will do, so surely we can control superintelligence,” but it just doesn’t seem like it’s enough. I’m working on a paper where I will do my best to figure out some answers for that.

Lucas: what is the definition of control and AI alignment?

Roman: I guess it’s very important to formalize those before you can answer the question. If we don’t even know what we’re trying to do, how can we possibly succeed? The first step in any computer science research project is to show that your problem is actually solvable. Some are not. We know, for example, holding problem is not solvable, so it doesn’t make sense to give it as an assignment to someone and wait for them to solve it. If you give them more funding, more resources, it’s just a waste.

Here, it seems like we have more and more people working very hard in different solutions, different methods, but can we first spend a little bit of time seeing how successful can we be? Based on the answer to that question, I think a lot of our governance and the legal framework and general decision-making about this domain will be impacted by it.

Lucas: If your core and key question here is whether or not the control problem or AI alignment is, in principle, or fundamentally solvable, could you give us a quick crash course on complexity theory and computational complexity theory and just things which take polynomial time to solve versus exponential time?

Roman: That’s probably the hardest course you’ll take as an undergraduate in computer science. At the time, I hated every second of it. Now, it’s my favorite subject. I love it. This is the only professor whom I remember teaching computational complexity and computability.

To simplify it, there are different types of problems. Surprisingly, almost all problems can be squeezed into one of those boxes. There are easy problems, which we can just quickly compute. Your calculator adding 2+2 is an example of that. There are problems where we know exactly how to solve them. It’s very simple algorithm. We can call it brute force. You try every option and you’ll always get the best answer, but there’s so many possibilities that in reality you can never consider every option.

Lucas: Like computing prime numbers.

Roman: Well, computer numbers are NP. It’s polynomial to test if a number is prime. It’s actually one of somewhat recent paper for the last 10 years, a great result, Ps are N prime. There are problems which are called NP complete and those are usually the interesting problems we care about and they all reduce to each other. If you solve one, you solved all of them. You cannot brute force them. You have to find some clever heuristics to get approximate answers, optimize those.

We can get pretty close to that. Examples like traveling salesperson problem. If you can figure out optimal way to deliver pizza to multiple households, if you can solve it in general case, you’ll solve 99% of interesting problems. Then there are some problems which we know no one can ever solve using Von Neumann architecture, like standard computer architecture. There are proposals for hyper computation computers with oracles, computers with all sorts of magical properties which would allow us to solve those very, very, very difficult problems, but that doesn’t seem likely anytime soon.

The best part of it I think is this idea of oracles. An oracle is a machine capable of doing magic to give you answer to otherwise unsolvable problem, and there are degrees of oracles. There are magical machines, which are more powerful magicians than the magical machine. None of it is working in practice. It’s all purely theoretical. You start learning about different degrees of magic and it’s pretty cool.

Lucas: Learning and understanding about what, in principle, is fundamentally computationally possible or feasible in certain time frames within the universe given the laws of physics that we have seems to be foundationally important and interesting. It’s one of, I guess, the final frontiers. Not space, but I guess solving intelligence and computation and also the sort of hedonic qualia that comes along for the ride.

Roman: Right. I guess the magical aspect allows you to escape from your local physics and consider other types of physics and what would be possible outside of this world.

Lucas: What advances or potential advances in quantum computing or other sorts of more futuristic hardware and computational systems help and assist in these problems?

Roman: I think quantum computing has more impact on the cryptography and security in that way. It impacts some algorithms more directly. I don’t think there is a determined need for it right now in terms of AI research or AI safety work. It doesn’t look like a human brain is using a lot of quantum effects though some people argue that it’s important for consciousness. I’m not sure if there is definitive proof of that experimentally.

Lucas: Let’s go ahead now and turn to some questions that we’ve gotten from our audience.

Roman: Sounds good.

Lucas: I guess we’re going to be jumping around here between narrow and short-term AI and some other questions. It would be great if you could let me know about the state of safety and security in current AI in general and the evaluation and verification and validation approaches currently adopted by the industry.

Roman: In general, the state of safety and security in AI is almost nonexistent. It’s kind of we’re repeating history. When we worked on creating Internet security was not something we cared about and so Internet is completely insecure. Then was started work on Internet 2.0, Internet of things. We’re repeating the same mistake. All those very cheap devices made in China have no security but they’re all connected and that’s how you can create swarms of devices attacking systems.

It is my hope that we don’t repeat this with intelligent systems, but right now it looks like we are. We care about getting them to the market as soon as possible, making them as capable as possible, the soonest possible. Safety and security is something most people don’t know about, don’t care about. You can see it in terms of number of researchers working on it. You can see it in terms of percentage of funding allocated to AI safety. I’m not too optimistic so far, but the field is growing exponentially, so that’s a good sign.

Lucas: How does evaluation and verification and validation fit into all of this?

Roman: We have pretty good tools for verifying critical software. Something so important… you’re flying to mars, the system cannot fail. Absolutely. We can do mathematical proofs to show that the code you created matches the design you had. It’s an expensive process, but we can do a pretty good job with it. You can put more resources into verifying it with multiple verifiers. You can get any degree of accuracy you want as a cost of computational resource.

As far as I can tell, there is no or very little successful work on verifying systems which are capable of self-improvement, changing, dynamically learning, operating in novel environments. It’s very hard to verify something where you have no idea what the behavior should be in the first beforehand. If it’s something linear, again, we have a chess computer, we know what it’s supposed to do exactly. It’s a lot easier to verify than something more intelligent than you operating a new data in a new domain.

Lucas: Right. It seems like verification in this area of AI is going to require some much more foundational and difficult proofs and verification techniques here. It seems like you’re saying it also requires an idea of an end goal of what the system is actually intended to do in order to verify that it satisfies that.

Roman: Right. You have to verify it against something. I have a paper on unverifiability where I talk about mathematical fundamental limits to what we can prove and verify mathematically. Already, we’re getting to the point where our mathematical proofs are so complex and so long, most human mathematicians cannot possibly even check if it’s legitimate or not.

We have examples of proofs where a mathematical community as a whole still has not decided if something published 10 years ago is a valid proof. If you’re talking about doing proofs on a black box AI systems, now it seems like the only option we have is another AI mathematician, verify our AI, assisting us with that, but this creates this multiple levels of interaction where who’s verifying, verifiers and so on.

Lucas: It seems to me at least another expression of how deeply interdependent the AI alignment problem is. Technical AI alignment is a core issue, but it seems like even in simple things, or not simple things, but things which you would imagine to at least be purely relegated to computer science also has some sort of connections with ethics and policy and law and how these things will all sort of require each other in order to succeed in AI alignment.

Roman: I agree. You do need this complete picture. Overall, I mentioned it a few times before in other podcasts. It feels like an AI safety, every time we analyze a problem, we discovered that it’s like a fractal. There is then more problems under that one and you do it again. Despite the three levels, you still continue with this. It’s an infinite process.

We never get to a point where, “Okay, we solved this. This is not a problem anymore. We know for sure it works in every conceivable situation.” That’s a problem. You have this infinite surface you have to defend, but you only have to fail once to lose everything. It’s very, very different from standard cyber security where, “Okay, somebody stole my credit card. I’ll just get a new one. I’ll get to try again.” Very different approach.

Lucas: There’s no messing up with artificial superintelligence.

Roman: Basically.

Lucas: Just going off of what we were talking about earlier in terms of how AI safety researchers are flirting and interested in the applications of psychology in AI safety, what do you think about the potential future relationship between AI and neuroscience?

Roman: That is great work in neuroscience and trying to understand measurements from just observing neurons, cells to human behavior. There are some papers showing if we do the same thing with computer processors, we’re just going to get a very good microscope and look at the CPU. “Was it playing a video game? Can we figure out connections between what Mario is doing and what electrical wiring is firing and so on?”

There seems to be a lot of mistakes made in that experiment. That tells us that the neuroscience experiments we’re doing for a very long time may be providing some less-than-perfect data for us. In a way, by doing AI work, we can also improve on our understanding of human brain, medical science, just general understanding of how neural networks work. It’s a feedback loop. That is progress in either one benefits the other.

Lucas: It seems like people like Josh Tenenbaum are working on more neuro inspired approaches to creating AGI. It seems that there are some people who have the view or the philosophy that the best way to getting to general intelligence is probably going to be understanding and studying human beings because we’re in existence proof that can be studied of general intelligences. What are your views on this approach and the work being done there?

Roman: I think it’s a lot easier to copy answers to get to the results. In terms of developing capable system, I think it’s the best option we have. I’m not so sure it leads to a safe system because if you just copy design, you don’t fully understand it. You can replicate it without complete knowledge and then instilling safety into it as a an afterthought, as a add-on later on, maybe even more difficult than if you designed it from scratch yourself.

Lucas: A more general strategy and approach, which gets talked about a lot in the effective altruism community: there seems to be this view and you can correct me here anywhere I might get this narrative sort of wrong. It seems important to build the AGI safety community, the AI safety community in general, by bringing more researchers into the fold.

If we can slow down the people who are working on capability and raw intelligence and bring them over to safety, then that might be a very good thing because it slows down the creation of the intelligence part of AGI and puts more researchers into the part that’s working on safety and AI alignment. Then there’s also this tension where …

While, that is a good thing. It may be a bad thing for us to be promoting AI safety or AGI safety to the public community because they probably just … Journalists would spin it and ruin it and trivialize it, turn it into a caricature of itself and just put Terminator photos on everything, which we at FLI are very aware that journalists like to put Terminator stuff on people’s articles and publications. What is your general view about AI safety outreach and do you disagree with the respectability first approach?

Roman: I’m an educator. I’m a professor. It’s my job to teach students, to educate the public, to inform everyone about science and hopefully more educated populace would benefit all of us. Research is funded through taxpayer grants. The public university is funded through taxpayers. The students paying tuition, the general public essentially.

If our goal is to align AI with values of the people, how can we keep people in the dark? They’re the ones who are going to influence elections. They are the ones who are going to decide what good governance of AI essentially is by voting for the right people. We put so much effort into governance of AI. We have efforts at UN, European Parliament, White House, you name it. There are now agreements between France and Canada on what to do with that.

At the end of the day, politicians listen to the public. If I can educate everyone about what the real issues in science are, I think it’s a pure benefit. It makes sense to raise awareness of long-term issues. We do it in every other field of science. Would you ever suggest it’s not a good idea to talk about climate change? No, of course not. It’s silly. We all participate in the system. We’re all impacted by the final outcome. It’s important to provide the good public outreach.

If your concern is the picture of a title of an article, well  work with better journalists, tell them you cannot use a picture of a Terminator. I do it. I tell them and they end up putting a very boring picture on it and nobody clicks on it. Is Terminator then an educational tool? I was able to explain some advanced computability concepts in a few minutes with simple trivial examples. Then you educate people, you have to come to their level. You have to say, “Well, we do have concerns about military killer robots.” There’s nothing wrong with that, so maybe funding for killer robots should be reduced. If public agrees, that’s wonderful.

Just kind of going if an article I published or somebody interviewed me is less than perfect, then it’s not beneficial, I disagree with it completely. It’s important to get to the public, which is not already sold on the idea. Me doing interview for you right now, right? I’m preaching to the choir. Most of your listeners are into AI safety I’m sure. Or at least effective altruism.

Whereas if I do interview for BBC or something like that, now I’m getting access to millions of people who have no idea what superintelligence is. In my world and your world, this is like common knowledge, but I give a lot of keynotes and I would go and speak to top executives for accounting firms and I ask them basic questions about technology. Maybe one ever heard about superintelligence as a concept.

I think education is always a good thing. Having educated populace is wonderful because that’s where funding will eventually come from for supporting our research and for helping us with AI governance. I’m a very strong supporter of outreach and I highly encourage everyone to do very good articles on it. If you feel that a journalist misrepresents your point of view, get in touch, get it fixed. Don’t just say that we’re going to left public in a dark.

Lucas: I definitely agree with that. I don’t really like this elitism that is part of the culture within some parts of AI safety community, which thinks that only the smartest, most niche people should be aware of this and working on it given the safety concerns and the ways in which it could be turned into something else.

Roman: I was a fellow at the Singularity Institute for Artificial Intelligence what is now MIRI. At that time, they had a general policy of not publishing. They felt it was undesirable and will cause more damage. Now, they publish extensively. I had mentioned that, that’s maybe a good idea a few times.

The general idea of buying out top AI developers and turning them to the white side I guess and working on safety issues, I think that’s wonderful. We want the top people. It doesn’t mean we have to completely neglect less than big names. Everyone needs to be invited to the table in terms of support, in terms of grants. Don’t try to think that reputation means that only people at Harvard and MIT work in AI safety.

There is lots of talent everywhere. I work with remote assistance from around the world. There is so much talent out there. I think the results speak for themselves. I get invited to speak internationally. I advise governments, courts, legislative system. I think reputation only grows with such outreach.

Lucas: For sure and it seems like the education on this, because it can seem fairly complicated and people can be really confused about it because I think that there are lots of common myths that people have about intelligence and “consciousness construed” in some way other than how I think you or I construe the term consciousness or the idea of free will or what it means to be intelligent. There’s just so much room for people to be confused about this issue.

The issue is real and it’s coming and people are going to find out about it whether or not we discuss it now. It seems very important that this happens, but also because like … It seems we also exist in a world where something like 40% to 50% of our country is at least skeptical about climate change. Climate change education and advocacy is very important and should be happening.

Even with all of that education and advocacy, there’s still something like around 40% of people who are skeptical about climate change. That issue has become politicized where people aren’t necessarily interested in facts. At least the skeptics are committed to party lines on the issue.

Roman: What would it be without education, if they never heard about the issue, would percentage be zero?

Lucas: I’m not advocating against education. I’m saying that this is an interesting existence case and saying like, “Yeah, we need more education about AI issues and climate change issues in general.”

Roman: I think there are maybe even more disagreement, not so much about how true of a problem is, but how to fix it. It turns into a political issue, then you start talking about let’s increase taxation, let’s decrease taxation. That’s what politicizes. That is not the fundamental science.

Lucas: I guess I just want to look this up actually just to figure out what the general American populace thinks. I think it was a bit wrong.

Roman: I don’t think it’s important what the exact percentage is. I think it’s general concept we care about.

Lucas: It’s a general concept, but I guess I was just potentially introducing a level of pessimism about why we need to educate people more so about AI alignment and AI safety in general just because these issues, even if you’re extremely skillful about them, can become politicized. Just generally the epistemology of America right now is exploding in a giant mess of bullshit. It’s just important that we educate clearly and correctly.

Roman: You don’t have to start with the most extreme examples or I don’t go with paperclip maximizers or whatever. You can talk about career selection, technological unemployment, basic income. Those things are quite understandable and they provide wonderful base for moving to the next level once we get there.

Lucas: Absolutely. Totally in agreement. How would you describe the typical interactions that you get from mainstream AI and CS researchers who just do sort of standard machine learning and don’t know or really think or care about AGI and ASI? When you talk to them and pitch to them like, “Hey, maybe you should be working on AI safety.” Or, “Hey, AI safety is something that is real, that you should care about.”

Roman: You’re right. There are different types of people based on their background knowledge. There is group one, which never heard of the concept. It’s just not part of their world. You can start by just sharing some literature and you can follow up later. Then there are people who are in complete agreement with you. They know it’s important. They understand the issue, but that’s their job they’re working and I think they are sympathetic to the cause.

Then there are people who heard a few kind of not the best attempts to explain what AI risk is, and so they are skeptical. They may be thinking about Terminator movie or something, Matrix, and so they are quite skeptical. In my personal experience, if I had a chance to spend 30 minutes to an hour with a person one-on-one, they all converted. I never had someone who went, “You told me things, but I have zero concern about intelligent systems having bugs in them or side effects or anything like that.”

I think it’s just a question of spending time and making it a friendly expedience. You’re not adversaries trying to fight it out. You’re just going, “Hey, every single piece of software we ever produced had bugs in it and can be had.” How is this different?

Lucas: I agree with you, but there are also seems to be these existence proofs and existence cases of people who are computer scientists and who are super skeptical about AI safety efforts and working on ASI safety like Andrew Ng and others.

Roman: You have to figure out each individual case-by-case basis of course, but just being skeptical about success of his approach is normal. I told you my main concern, is the problem solvable. That’s a degree of skepticism. If we looked at any other industry. Let’s say we had oil industry. The top executive oil industry said that global climate change is not important. Just call it redistribution of good weather or something, it’s not a big deal.

You would immediately think there is some sort of conflict of interest, right? But how is this different? If you are strongly dependent on development, not on anything else, it just makes sense that you would be 100% for development. I don’t think it’s unnatural at all. Again, I think a good conversation and realignment of incentives would do miracles for such cases.

Lucas: It seems like either because Andrew Ang’s timelines are so long or he just thinks that they’re fundamentally, like there’s just not really a big problem. I think there are some computer scientists, researchers who just think there’s just not really a problem, because we’re making the systems and there are systems that are so intertwined with us that the values will just naturally mesh together or something. I’m just so surprised I guess that from the mainstream CS and AI people that you don’t run into more skeptics.

Roman: I don’t start my random interactions with people by trying to tell them, “You are wrong. Change your mind.” That’s usually not the best approach. Then you talk about specific cases and you can take it slowly and increase the level of concern. You can start by talking about algorithmic justice and bias in algorithms and software verification. I think you’ll get 100% support at all those levels.

What happens when your system is slightly more capable, you’re still working with me? I don’t think there is a gap where you go, “Well, at that point, everything becomes rosy and safe and we don’t have to worry about it.” If a disagreement is about how soon, I think it’s not a problem at all. Everything I argue still applies in 20 years, 50 years, 100 years.

If you’re saying it will take 100 years to get to superintelligence, how long will it take to learn how to control a system we don’t have yet? Probably way longer than that. Already, we should have started 50 years ago. It’s too late now. If anything, it strengthens my point that we should put more resources on the safety side.

Lucas: Absolutely. Just a question about generally your work cataloging failures of AI products and what this means for the future.

Roman: I collect examples, historical examples starting with the very first AI systems, still everyday news of how AI systems fail. The examples you all heard about. Self-driving car kills a pedestrian. Or Microsoft Tay chat bot becomes racist and swears at people. I have maybe about 50 or 60 so far. I keep collecting new ones. Feel free to send me lots of cool examples, but make sure they’re not already on my list.

The interesting thing is the patterns. You can get from it, learn from it and use to predict future failure. One, obviously as AI becomes more common, we have more of those systems, the number of such failures grows. I think it grows exponentially and impacts from them grows.

Now, we have intelligent systems trading in the stock market. I think they take up something like 85% of all stock trades. We had examples where they crash the whole stock market, brought down the volume by $1 trillion or something, closed significant amounts. This is very interesting data. I try to create a data set of those examples and there is some interest from industry to understand how to make their products not make my list in the future.

I think so far the only … It sounds like a trivial conclusion, but I think it’s fundamental. The only conclusion I have is that if you design an AI system to do X, it will very soon fail to X whatever X stands for. It seems like it’s only going to get worse as they become more general because the value of X becomes not just narrow. If you designed a system to play chess, then it will fail to win a chess match. That’s obvious and trivial. But if you design the system to run the world or something like that, what is X here?

Lucas: This makes me think about failure modes. Artificial superintelligence is going to have a probability space of failure modes where the severity of the failure at the worst end … We covered this in my last podcast is it would literally be turning the universe into the worst possible suffering imaginable for everyone for as long as possible. That’s some failure mode of ASI which has some probability which is unknown. Then the opposite on the other end is going to be, I guess, the most well-being and bliss for all possible minds, which exists in that universe. Then there’s everything in between.

I guess the question is, is there any mapping or how important is it in mapping this probability space of failure modes? What are the failure modes that ASI can do or that would occur that would make it not value aligned? What are the probabilities of each of those given, I don’t know, the sort of architecture that we expect ASI to have or how we expect ASI to function?

Roman: I don’t think there is a worst and best case. I think it’s infinite in both directions. It can always get worse and always get better.

Lucas: But it’s constrained by what is physically possible.

Roman: Knowing what we know about physics and within this universe, there is a big multiverse out there possibly with different types of physics and simulated environments can create very interesting side effects as well. That’s not the point. I also collect predicted failures of future systems, part of a same report. You can look it up. That’s very interesting to see what usually a scientist, but sometimes science fiction writers, other people had said as potential examples.

It has things like paperclip maximizer and other examples. I look at predictions which are predictions but short-term. For example, we can talk about sex robots and how they’re going to fail. Someone hacks them, then they forget to stop. You forget your safe word. There are interesting possibilities.

Very useful both as an educational tool to get people to see this trend and go, “Okay. At every level of AI development, we had problems proportionate to the capability of AI. Give me a good argument why it’s not the case moving forward?” Very useful tool for AI safety researchers to predict. “Okay, we’re releasing this new system tomorrow. It’s capable of X.” How can we make sure the problems don’t follow?

I published on this, for example, before Microsoft released their Tay chatbot. Giving Xs to users to manipulate your learning data is usually not a safe option. If they just knew about it, maybe they wouldn’t embarrass themselves so bad.

Lucas: Wonderful. I guess just one last question here. My view was that given a superintelligence originating on earth, there would be a physical maximum of the amount of matter and energy which it could manipulate given our current understanding and laws of physics, which are certainly subject to change if we gain new information.

There is something which we could call, as Nick Bostrom explains, the cosmic endowment which is sort of the sphere around an intelligent species, which is running a superintelligent optimization process. Where the sphere represents the maximum amount of matter and energy, a.k.a., galaxies a superintelligence can reach before the universe expands so much that it’s no longer able to get beyond that point. Why is your view that there isn’t a potentially physical best or physical worst thing that, that optimization process could do?

Roman: Computation is done with respect to time. It may take you twice as long to compute something with the same resources, but you’ll still get that if you don’t have limits on your time. Or you create a subjective time for whoever is experiencing things. You can have computations which are not in parallel, serial computation devoted to a single task. It’s quite possible to create, for example, levels of suffering which progressively get worse I think. Again, I don’t encourage anyone experimenting with that, but it seems like things can get worse not just because of limitations, of how much computing I can do.

Lucas: All right. It’s really been a wonderful and exciting conversation Roman. If people want to check out your work or to follow you on Facebook or Twitter or wherever else, what do you recommend people go to read these papers and follow you?

Roman: I’m very active in social media. I do encourage you to follow me on Twitter, RomanYam, or on Facebook, Roman Yampolskiy. Just Google my name. My Google Scholar has all the papers and just trying to make a sell here. I have a new book coming out, Artificial Intelligence Safety and Security. It’s an edited book with all the top AI safety researchers contributing, and it’s due out in August, mid August. Already available for presale.

Lucas: Wow. Okay. Where can people get that? On Amazon?

Roman: Amazon is a great option. It’s published by CRC Press, so you have multiple options right now. I think it’s available as a softcover and hardcover, which are a bit pricey. It’s a huge book about 500 pages. Most people would publish it as a five book anthology, but you get one volume here. It should come out as a very affordable digital book as well, about $30 for 500 pages.

Lucas: Wonderful. That sounds exciting. I’m looking forward to getting my hands on that. Thanks again so much for your time. It’s really been an interesting conversation.

Roman: My pleasure and good luck with your podcast.

Lucas: Thanks so much. If you enjoyed this podcast, please subscribe, give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment Series.

[end of recorded material]

Podcast: Mission AI – Giving a Global Voice to the AI Discussion with Charlie Oliver and Randi Williams

How are emerging technologies like artificial intelligence shaping our world and how we interact with one another? What do different demographics think about AI risk and a robot-filled future? And how can the average citizen contribute not only to the AI discussion, but AI’s development?

On this month’s podcast, Ariel spoke with Charlie Oliver and Randi Williams about how technology is reshaping our world, and how their new project, Mission AI, aims to broaden the conversation and include everyone’s voice.

Charlie is the founder and CEO of the digital media strategy company Served Fresh Media, and she’s also the founder of Tech 2025, which is a platform and community for people to learn about emerging technologies and discuss the implications of emerging tech on society. Randi is a doctoral student in the Personal Robotics Group at the MIT Media Lab. She wants to understand children’s interactions with AI, and she wants to develop educational platforms that empower non-experts to develop their own AI systems. 

Topics discussed in this episode include:

  • How to inject diversity into the AI discussion
  • The launch of Mission AI and bringing technologists and the general public together
  • How children relate to AI systems, like Alexa
  • Why the Internet and AI can seem like “great equalizers,” but might not be
  • How we can bridge gaps between the generations and between people with varying technical skills

Papers discussed in this episode include:

  • Druga, S., Williams, R., Resnick, M., & Breazeal, C. (2017). “Hey Google, is it OK if I Eat You?”: Initial Explorations in Child-Agent Interaction. Proceedings of the 16th ACM SIGCHI Interaction Design and Children (IDC) Conference, ACM. [PDF]
  • Stefania Druga, Randi Williams, Hae Won Park, and Cynthia Breazeal. 2018. How smart are the smart toys?: children and parents’ agent interaction and intelligence attribution. In Proceedings of the 17th ACM Conference on Interaction Design and Children (IDC ’18). ACM, New York, NY, USA, 231-240. DOI: https://doi.org/10.1145/3202185.3202741[PDF]
  • Randi Williams, Christian Vazquez, Stefania Druga, Pattie Maes, Cynthia Breazeal. “My Doll Says It’s OK: Voice-Enabled Toy Influences Children’s Moral Decisions.” IDC. 2018

You can listen to this episode above or read the transcript below. And don’t forget to check out previous episodes of FLI’s monthly podcast on SoundCloud, iTunes, Google Play and Stitcher.

 

Ariel: Hi, I am Ariel Conn with The Future of Life Institute. As a reminder, if you’ve been enjoying our podcasts, please remember to take a minute to like them, and share them, and follow us on whatever platform you listen on.

And now we’ll get on with our podcast. So, FLI is concerned with broadening the conversation about AI, how it’s developed, and its future impact on society. We want to see more voices in this conversation, and not just AI researchers. In fact, this was one of the goals that Max Tegmark had when he wrote his book, Life 3.0, and when we set up our online survey about what you want the future to look like.

And that goal of broadening the conversation is behind many of our initiatives. But this is a monumental task, that we need a lot more people working on. And there is definitely still a huge communications gap when it comes to AI.

I am really excited to have Charlie Oliver, and Randi Williams with me today, to talk about a new initiative they’re working on, called Mission AI, which is a program specifically designed to broaden this conversation.

Charlie Oliver is a New York based entrepreneur. She is the founder and CEO of Served Fresh Media, which is a digital media strategy company. And, she’s also the founder of Tech 2025, which is a platform and community for people to learn about emerging technologies, and to discuss the implications of emerging tech on our society. The mission of Tech 2025 is to help humanity prepare for, and define what that next technological era will be. And so it was a perfect starting point for her to launch Mission AI.

Randi Williams is a doctoral student in the personal robotics group at the MIT Media Lab. Her research bridges psychology, education, engineering, and robotics, to accomplish two major goals. She wants to understand children’s interactions with AI, and she wants to develop educational platforms that empower non-experts to develop their own AI systems. And she’s also on the board of Mission AI.

Randi and Charlie, thank you both so much for being here today.

Charlie: Thank you. Thank you for having us.

Randi: Yeah, thanks.

Ariel: Randi, we’ll be getting into your work here a little bit later, because I think the work that you’re doing on the impact of AI on childhood development is absolutely fascinating. And I think you’re looking into some of the ethical issues that we’re concerned about at FLI.

But first, naturally we wanna start with some questions about Mission AI. And so for example, my very first question is, Charlie can you tell us what Mission AI is?

Charlie: Well, I hope I can, right? Mission AI is a program that we launched at Tech 2025. And Tech 2025 was launched back in January of 2017. So we’ve been around for a year and a half now, engaging with the general public about emerging technologies, like AI, blockchain, machine learning, VR/AR. And, we’ve been bringing in experts to engage with them — researchers, technologists, anyone who has a stake in this. Which pretty much tends to be everyone, right?

So we’ve spent the last year listening to both the public and our guest speakers, and we’ve learned so much. We’ve been so shocked by the feedback that we’ve been getting. And to your initial point, we learned, as I suspected early on, that there is a big, huge gap between how the general public is interpreting this, and what they expect, and how researchers are interpreting this. And how corporate America, the big companies, are interpreting this, and hope to implement these technologies.

Equally, those three separate entities also have their fears, their concerns, and their expectations. We have seen the collision of all three of those things at all of our events. So, I decided to launch Mission AI to be part of the answer to that. I mean, because as you mentioned, it is a very complicated, huge problem, monumental. And what we will do with Mission AI, is to address the fact that the general public really doesn’t know anything about the AI, machine learning research that’s happening. And there’s, as you know, a lot of money, globally, being tossed — I don’t wanna say toss — but AI research is heavily funded. And with good reason.

So, we want to do three things with this program. Number one, we want to educate the general public on the AI machine learning research ecosystem. We happen to believe that it’s crucial that, in order for the general public to participate — and understand what I mean by the general public, I should say, that includes technologists. Like 30 to 35 percent of our audience are engineers, and software developers, and people in tech companies, or in companies working in tech. They also include business people, entrepreneurs, students, we have baby boomers, we have a very diverse audience. And we designed it so that we can have a diverse conversation.

So we want to give people an understanding of what AI research is, and that they can actually participate in it. So we define the ecosystem for them to keep them up to date on what research is happening, and we give them a platform to share their ideas about it, and to have conversations in a way that’s not intimidating. I think research is intimidating for a lot of people, especially academic research. We however, will be focusing more on applied research, obviously.

The second thing that we want to do is, we want to produce original research on public sentiment, which, it’s a huge thing to take on, but the more that we have moved, grown this community — and we have several thousand people in our community now, we’ve done events here, and in Toronto; we’ve done over 40 events across different topics — we are learning that people are expressing ideas, and concerns, and just things that I have been told by researchers who come in to speak at our events, it’s surprising them. So, it’s all the more important that we get the public sentiment and their ideas out. So our goal here is to do research on what the public thinks about these technologies, about how they should be implemented, and on the research that is being presented. So a lot of our research will be derivative of already existing research that’s out there.

And then number three, we want to connect the research community, the AI research community, with our community, or with the broader public, which I think is something that’s really, very much missing. And we have done this at several events, and the results are not only absolutely inspiring, everyone involved learns so much. So, it’s important, I think, for the research community to share their work with the general public, and I think it’s important for the general public to know who these people are. There’s a lot of work being done, and we respect the work that’s being done, and we respect the researchers, and we want to begin to show the face of AI and machine learning, which I think is crucial for people to connect with it. And then also, that extends to Corporate America. So the research will also be available to companies, and we’ll be presenting what we learn with them as well. So that’s a start.

Ariel: Nice. So to follow up on that a little bit, what impact do you hope this will have? And Randi, I’d like to get your input on some of this as well in terms of, as an AI researcher, why do you personally find value in trying to communicate more with the general public? So it’s sort of, two questions for both of you.

Randi: Sure, I can hop in. So, a lot of what Charlie is saying from the researcher’s side, is a big question. It’s a big unknown. So actually a piece of my research with children is about, well when you teach a child what AI is, and how it works, how does that change their interaction with it?

So, if you were extend that to something that’s maybe more applicable to the audience — if you were to teach your great, great grandma about how all of the algorithms in Facebook work, how does that change the way that she posts things? And how does that change the way that she feels about the system. Because we very much want to build things that are meaningful for people, and that help people reach their goals and live a better life. But it’s often very difficult to collect that data. Because we’re not huge corporations, we can’t do thousand person user studies.

So, as we’re developing the technology and thinking about what directions to go in, it’s incredibly important that we’re hearing from the baby boomers, and from very young people, from the scientists and engineers who are maybe in similar spaces, but not thinking about the same things, as well as from parents, teachers, all of the people who are part of the conversation.

And so, I think what’s great about Mission AI is that it’s about access, on both ends.

Charlie: So true. And you know, to Randi’s point, the very first event that we did was January the 11th, 2017, and it was on chatbots. And I don’t know if you guys remember, but that doesn’t seem like a long time ago, but people really didn’t know anything about chatbots back then.

When we had the event, which was at NYU, it sold out in record time, like in two days. And when we got everybody in the room, it was a very diverse audience. I mean we’re talking baby boomers, college students, and the first question I asked was, “How many people in here are involved in some way with building, or developing chatbots, in whatever way you might be?” And literally I would say about, 20 to 25 percent of the hands went up.

For everyone else, I said, “Well, what do you know chatbots? What do you know about it?” And most said, “Absolutely nothing.” They said, “I don’t know anything about chatbots, I just came because it looked like a cool event, and I wanna learn more about it.”

But, by the end of the event, we help people to have these group discussions and solve problems about the technologies, together. So that’s why it’s called a think tank. At the end of the event there were these two guys who were like 25, they had a startup that works with agencies that develop chatbots for brands. So they were very much immersed in the space. After the event, I would say a week later, one of them emailed me and said, “Charlie, oh my God, that event that you did, totally blew our minds. Because we sat in a group with five other people, and one of those people was John. He’s 75 years old. And he talked to us.” Part of the exercise that they had to do was to create a Valentine’s Day chatbot, and to write the conversational flow of that chatbot. And he said that after talking to John, who’s 75 years old, about what the conversation would be, and what it should be, and how it can resonate with real people, and different types of people. He said that they realized they had been building chatbots incorrectly all along. He realized that they were narrowing their conversations, in the conversational flows, in a way that restricted their technology from being appealing to someone like him. And they said that they went back, and re-did a lot of their work to accommodate that.

So I thought that was great. I think that’s a big thing in terms of expectations. We want to build these technologies so that they connect with everyone. Right?

Ariel: I’d like to follow up with that. So there’s basically two sides of the conversation. We have one side, which is about educating the public about the current state, and future of artificial intelligence. And then, I think the other side is helping researchers better understand the impact of their work by talking to these people who are outside of their bubbles.

It sounds to me like you’re trying to do both. I’m curious if you think both are either, equally challenging, or easy to address, or do you think one side is harder? How do you address both sides, and effect change?

Charlie: That is a great, great question. And I have to tell you that on both sides, we have learned so much, about both researchers, and the general public. One of the things that we learned is that we are all taking for granted what we think we know about people. All of us. We think we’ve got it down. “I know what that student is thinking. I know what that black woman is thinking. I know how researchers think.” The fact of the matter is, we are all changing so much, just in the past two to three years, think about who you were three years ago. We have changed how we think about ourselves and the world so much in the past two years, that it’s pretty shocking, actually. And even within the year and a half that we have been up and going, my staff and I, we sit around and talk about it, because it kind of blows our minds. Even our community has changed how they think about technologies, from January of last year, to today. So, it’s actually extremely, extremely difficult. I thought it would get easier.

But here’s the problem. Number one, again, we all make assumptions about what the public is thinking. And I’m gonna go out on a limb here and say that we’re all wrong. Because they are changing the way that they think, just as quickly as the technologies are changing. And if we don’t address that, and meet that head on, we are always going to be behind, or out of sync, with what the general public is thinking about these technologies. And I don’t think that we can survive. I don’t think that we can actually move into the next era of innovation unless we fix that.

I will give you a perfect example of that. Dr. James Phan co-created the IBM Watson Q&A system. And he’s one of our speakers. He’s come to our events maybe two or three times to speak.

And he actually said to me, as I hear a lot from our researchers who come in, he says, “My God, Charlie, every time I come to speak at your event, I’m blown away by what I hear from people.” He said, “It seems like they are thinking about this very differently.” He says, “If you ask me, I think that they’re thinking far more in advance than we think that they are.”

And I said, “Well, that shocks me.” And so, to give you a perfect example of that, we did an event with Ohio State regarding their Opioid Technology Challenge. And we had people in New York join the challenge, to figure out AI technologies that could help them in their battle against opioid addiction in their state. And I had him come in, as well as several other people come in, to talk about the technologies that could be used in this type of initiative. And James is very excited. This is what I love about researchers, right? He’s very excited about what he does. And when he talks about AI, he lights up. I mean you’ve just never seen a man so happy to talk about it. So he’s talking to a room full of people who are on the front lines of working with people who who are addicted to opioids, or have some sort of personal connection it. Because we invited people like emergency responders, we invited people who are in drug treatment facilities, we’ve invited doctors. So these are people who are living this.

And the more he talked about algorithms, and machine learning, and how they could help us to understand things, and make decisions, and they can make decisions for us, the angrier people got. They became so visibly angry, that they actually started standing up. This was in December. They started standing up and shouting out to him, “No way, no way can algorithms make decisions for us. This is about addiction. This is emotional.” And they really, it shocked us.

I had to pull him off the stage. I mean, I didn’t expect that. And he didn’t see it, because he just kept talking, and I think he felt like the more he talked about it, the more excited they would become, like him, but it was quite the contrary, they became angrier. That is the priceless example, perfect example, of how the conversations that we have, that we initiate between researchers and the public, are going to continue to surprise us. And they’re going to continue to be shocking, and in some cases, very uncomfortable. But we need to have them.

So, no it is not easy. But yes we need to have them. And in the end, I think we’re all better for it. And we can really build technologies that people will embrace, and not protest.

Ariel: So Randi, I’d like to have you jump in now, because you’ve actually done, from the researcher side, you’ve done an event with Tech 2025, or maybe more than one, I’m not sure. So I was hoping you could talk about your experience with that, and what you gained out of it.

Randi: Yeah, so that event I was talking about a piece of research I had done, where I had children talk about their perceptions of smart toys. And so this is a huge, also, like Charlie was saying, inflammatory topic because, I don’t know, parents are extremely freaked out. And I think, no offense to the media, but there’s a bit of fear mongering going on around AI and that conversation. And so, as far as what’s easier, I think the first step, what makes it really difficult for researchers to talk to the public right now, is that we have been so far out of the conversation, that the education has gotten skewed. And so it’s difficult for us to come in and talk about algorithms, and machines making decisions, without first dealing with, you know, and this is okay, and it’s not a terminator kind of thing. At the end of the day, humans are still in control of the machines.

So what was really interesting about my experience, talking with Tech 2025, is that, I had all of these different people in the room, a huge variety of perspectives. And the biggest thing to hear, was what people already knew. And, as I was talking and explaining my research, hearing their questions, understanding what they understood already, what they knew, and what wasn’t so clear. So one of the biggest things is, when you see an AI system teach itself to play chess, and you’re like, “Oh my God, now it’s gonna teach itself to like, take over a system, and hack into the government, and this is that.” And it’s like, no, no, it’s just chess. And it’s a huge step to get any further than that.

And so it was really great practice for me to try and take people who are in that place, and say, “Well no, actually this is how the technology works, and this is the limitations.” And try to explain, you know, so when could this happen, in what particular universe could this happen? Well maybe, like in 20 years if we find a general AI, then yeah, it could teach itself to solve any problem. But right now, every single problem requires years of work.

And then seeing what metaphors work. What metaphors make sense for an AI scientist who wants to relate to the public. What things click, which things don’t click? And I think, another thing that happened, that I really loved was, just thinking about the application space. I’m asking research questions that I think are intellectually interesting for my work. But, there was a person from a company, who was talking about implementing a skill in Alexa, and how they didn’t know if using one of their characters on Alexa, would be weird for a child. Because, I was talking about how children look at an Alexa, and they think Alexa’s like a person. So Alexa is an Alexa, and if you talk to another Alexa, that’s a new Alexa. Yeah they have the same name, but completely different people, right?

So what happens when Alexa has multiple personality disorder? Like how does a child deal with that? And that was a question that never would have come up, because I’m not writing skills with different characters for children. So, that’s just an example of how learning as an AI scientist, how to give, how to listen to what people are trying to understand, and how to give them the education they need. But then also taking, okay, so when you’re at home and your child is doing xyz with Alexa, where are the questions there that you have, that researchers should be trying to answer? So, I don’t know which one is harder.

Charlie: I specifically went after Randi for this event. And I invited her because, I had been thinking in my mind for a while, that we are not talking about children in AI, not nearly enough. Considering that they’re gonna be the ones in ten to 15 years who are gonna be developing these things, and this technology and everything. So I said, “You know, I am willing to bet that children are thinking very differently about this. Why aren’t we talking about it?” So, I get online, I’m doing all my, as anyone would, I do all my little research to try to figure it out, and when I came across Randi’s research, I was blown away.

And also, I had her in mind with regards to this because I felt like this would be the perfect test of seeing how the general public would receive research, from a research assistant who is not someone who necessarily has — obviously she’s not someone who has like 20 years of experience behind her, she’s new, she’s a fresh voice. How would she be received? How would the research be received?

And on top of that, to be honest with you, she’s a young black woman. Okay? And in terms of diversity of voices within the research community, and within the AI discussion as a whole, this is something I want to address, aggressively.

So we reached out to the toy companies, we reached out to child psychologists, teachers, students, children’s museums, toy stores, I can’t tell you how many people we reached out to in the greater New York City area.

Randi was received so well, that I had people coming up to me, and high fiving me, saying, “Where did you get her? Where did you find her?” And I’m like, “Well you know, she didn’t drop out of the sky. She’s from MIT.”

But Randi’s feedback was crucial for me too because, I don’t know what she’s getting from it. And we cannot be effective at this if we are not, all of us, learning from each other. So if my researchers who come in and speak aren’t learning, I’m not doing my job. Same with the audience.

Ariel: So, Randi, I’m gonna want to start talking about your research here in a minute, ’cause we’ve just gotten a really great preview of the work you’re doing. But before we get to that, one, not final question, but for a little bit, a final question about Mission AI, and that is this idea of diversity.

AI is not a field that’s known for being diverse. And I read the press release about this, and the very first thing, in the very first bullet point, about what Mission AI is going to do, was about injecting diversity. And so my question to both of you is, how can we do that better? How can the AI community do that better? And in terms of the dialogue for who you’re reaching out to, as well, how can we get more voices?

Randi: You know in some ways, it’s like, there’s nothing you can do, to not do better. I think what Mission AI is really about, is thinking about who’s coming to the table to hear these things, very critically. And being on the board, as Charlie said, a black woman, the people who I talk to in AI are people of color, and women, right? So, I hope that as being a main part of this, and having Charlie also be a main part of that, we have a network that’s both powerful, in terms of having the main players in AI come to the table, but you know, main players that are also not, I guess the stereotypical AI scientist that you would think of.

So, what makes this different is who’s leading it, and the fact that we’re thinking about this from the very beginning. Like, “Okay, we’re gonna reach out. We want to recruit research scientists,” so I’m thinking of my peers who are in schools all across the country, and what they’re doing, and how this can be meaningful for them, and how they can, I guess, get an experience in communicating their research with the public.

Charlie: Yeah, I totally agree.

In addition to that, bringing in people who are from different backgrounds, and bringing diversity to the speakers, is very important. But it’s equally as important to have a diverse room. The first thing that I decided when I launched Tech 2025, and the reason that I’ve decided to do it this way, is because, I did not want to have a room full of the hoodie crowd. Which is, you know, white guys in their 20’s with hoodies on. Right? That’s the crowd that usually gets the attention with regards to AI and machine learning. And no offense to them, or to what they’re doing, everyone’s contributing in their own way.

But I go to tech events, as I know you guys do too. I go to tech events here, and in San Francisco, and across the country, and different parts of the world. And, I see that for the most part a lot of these rooms are filled, especially if you talk about blockchain, and cryptocurrency, which we do as well, they’re filled with primarily white guys.

So, I intentionally, and aggressively, made it a point to include as many people from various backgrounds as possible. And it is a very deliberate thing that you have to do, starting with the content. I don’t think a lot of people realize that, because people say to me, “How do you get such diverse people in the room?”

Well number one, I don’t exclude anyone, but also, the content itself asks people from various backgrounds to come in. So, a lot of times, especially in our earlier events, I would make a point of saying, it doesn’t matter who you are, where you’re from, we don’t care if you’re a technologist, or if you are a baby boomer who’s just curious about this stuff, come on in. And I have actually had people in their 60s come to me, I had a woman come to me last year, and she says, “My God Charlie, I feel like I really can participate in these discussions at your event. I don’t feel like I’m the odd woman out, because I’m older.”

So I think that’s a very important thing, is that, when researchers look at the audience that they’re talking to, they need to see diversity in that audience too. Otherwise, you can reinforce the biases that we have. So if you’re a white guy and you’re talking to an audience full of nothing but white guys, you’re reinforcing that bias that you have about what you are, and the importance of your voice in this conversation.

But when my guests come in to speak, I tell them first and foremost, “You are amazing. I love the work that you do, but you’re not the … The star of the show is the audience. So when you look at them, just know that they are, it’s very important that we get all of their feedback. Right? That we allow them to have a voice.” And it turns out that that’s what happens, and I’m really, I’m happy that we’re creating a dialogue between the two. It’s not easy. I think it’s definitely what needs to happen. And with going back to what Randi says, it does need to be deliberate.

Ariel: I’m going to want to come back to this, because I want to talk more about how Mission AI will actually work. But I wanna take a brief pause, because we’ve sort of brought up some of Randi’s work, and I think her work is really interesting. So I wanted to talk, just a little bit about that, since the whole idea of Mission AI is to give a researcher a platform to talk about their work too.

So, one of my favorite quotes ever, is the Douglas Adams quote about age and technology, and he says, “I’ve come up with a set of rules that describe our reactions to technologies. One, anything that is in the world when you’re born, is normal and ordinary and is just a natural part of the way the world works. Two, anything that’s been invented when you’re 15 to 35 is new, and exciting, and revolutionary, and you can probably get a career in it. Three, anything invented after you’re 35 is against the natural order of things.”

Now, I personally, I’m a little bit worried that I’m finding that to be the case. And so, one of things that I’ve found really interesting is, we watch these debates about what the impact of AI will be on future generations. There are technologies that can be harmful, period. And trying to understand, when you’re looking at a technology that can be harmful, versus when you’re looking at a technology and you just don’t really know what the future will be like with it, I’m really curious what your take on how AI will impact children as they develop, is. You have publications that, there’s at least a couple great titles. One is, “Hey Google, is it okay if I eat you?” And then another is, “My Doll Says It’s Okay, Voice Enabled Toy Influences Children’s Moral Decisions.”

So, my very first question for you is, what are you discovering so far with the way kids interact with technology? Is there a reason for us to be worried? Is there also reason for us to be hopeful?

Randi: So, now that I’m hearing you say that, I’m like, “Man I should edit the titles of my things.”

First, let me label myself as a huge optimist of AI. Obviously I work as an AI scientist. I don’t just study ethics, but I also build systems that use AI to help people reach their goals. So, yeah, take this with a grain of salt, because obviously I love this, I’m all in it, I’m doing a PhD on it, and that makes my opinion slightly biased.

But here’s what I think, here’s the metaphor that I like to use when I talk about AI, it’s kind of like the internet. When the internet was first starting, people were like, “Oh, the Internet’s amazing. It’s gonna be the great equalizer, ’cause everyone will be able to have the same education, ’cause we’ll all have access to the same information. And we’re gonna fix poverty. We’re gonna fix, everything’s gonna go away, because the internet.” And in 2018, the Internet’s kind of like, yeah, it’s the internet, everyone has it.

But it wasn’t a great equalizer. It was the opposite. It’s actually creating larger gaps in some ways, in terms of people who have access to the internet, and can do things, and people who don’t have access. As well as, what you know about on the internet makes a huge difference in your experience on it. It also in some ways, promotes, very negative things, if you think about like, the dark web, modern day slavery, all of these things, right? So it’s like, it’s supposed to be great, it’s supposed to be amazing. It went horribly wrong. AI is kind of like that. But maybe a little bit different in that, people are already afraid of it before it’s even had a chance.

In my opinion, AI is the next technology that has the potential to be a great equalizer. The reason for that is, because it’s able to extend the reach that each person has in terms of their intellectual ability, in terms of their physical ability. Even, in terms of how they deal with things emotionally and spiritually. There’s so many places that it can touch, if the right people are doing it, and if it’s being used right.

So what’s happening right now, is this conversation with children in AI. The toy makers, and the toy companies are like, “We can create a future where every child grows up, and someone is reading to them, and we’re solving all the problems. It’s gonna be great.” And then they say to the parents, “I’m gonna put this thing in your home, and it’s gonna record everything your child says, and then it’s gonna come back to our company, and we’re gonna use it to make your life better. And you’re gonna pay us for it.” And parents are like, “I have many problems with this. I have many, many problems with everything that you’re saying.”

And so, there’s this disconnect between the potential that AI has, and the way that it’s being seen as the public, because, people are recognizing the dangers of it. They’re recognizing that the amount of access that it has, is like, astronomical and crazy. So for a second, I’ll talk about the personal robots group. In the MIT Media Lab, the personal robots group, we specifically build AI systems that are humanistic. Meaning that we’re looking at the way that people interact with their computers, and with cellphones, and it’s very, cagey. It’s very transactional, and in many ways it doesn’t help people live their lives better, even though it gives them more access. It doesn’t help them achieve all of their goals. Because you know, in some ways it’s time consuming. You see a group of teenagers, they’re all together, but they’re all texting on phones. It’s like, “Who are you talking to? Talk to your friends, they’re right there.” But that’s not happening, so we built systems specifically, that try to help people achieve their goals. One great example of that, is we found educational research that says that your vocabulary at the age of five, is a direct predictor of your PSAT score in the 11th grade. And as we all know, your PSAT score is a predictor of your SAT score. Your SAT score is a predictor of your future income, and potential in life, and all these great things.

So we’re like, “Okay, we wanna build a robot that helps children, who may not have access for any number of reasons, be able to increase their vocabulary size.” And we were gonna use AI that can personalize to each child, because every child’s different. Some children want the competitive robot that’s gonna push them, some children want the friendly robot that’s gonna work with them, and ask them questions, and put them in the perspective of being a teacher. And, AI is the only thing, like in a world, where classroom sizes are getting bigger, where parents can’t necessarily spend as much time at home, those are the spaces where we’re like, AI can help. And so we build systems that do that.

We don’t just think about teaching this child vocabulary words. We think about how the personality of the robot is shaping the child as a learner. So how is the robot teaching the child to have a growth mindset, and teaching them to persevere, to continue learning better. So those are the kinds of things that we want to instill, and AI can do that.

So, when people say, “AI is bad, it’s evil.” We’re like, “Well, we’re using a robot that teaches children that working hard is more important than just being magically smart.” ‘Cause having a non-growth mindset, like, “I’m a genius,” can actually be very limiting ’cause when you mess up, then you’re like, “I’m not a genius. I’m stupid.” It’s like, no, work hard, you can figure things out.

So, personally, I think, that kind of AI is extremely impactful, but the conversation that we need to have now, is how do we get that into the public space, in an appropriate way. So maybe, huge toy companies shouldn’t be the ones to build it, because they obviously have a bottom line that they’re trying to fill. Maybe, researchers are the ones who wanna build it. My personal research is about helping the public build their own AI systems to reach these goals. I want a parent to be able to build a robot for their child, that helps the child better reach their goals. And not to replace the parent, but you know, there are just places where a parent can’t be there all the time. Play time, how can play time, how can the parent, in some ways, engineer their child’s play time, so that they’re helping the child reinforce having a growth mindset, and persevering, and working hard, and maybe cleaning up after yourself, there are all these things.

So if children are gonna be interacting with it anyways, how can we make sure that they’re getting the right things out of that?

Ariel: I’d like to interject with a question real quick. You’d mentioned earlier that parents aren’t psyched about having all of their kids’ information going back to toy companies.

Randi: Yeah.

Ariel: And so, I was gonna ask if you see ways in which AI can interact with children that doesn’t have to become basically massive data dumps for the AI companies? Is this, what you’re describing, is that a way in which parents can keep their children’s data private? Or would that still end up, all that data go someplace?

Randi: The way that the AI works depends heavily on the algorithm. And what’s really popular right now, are deep learning algorithms. And deep learning algorithms, they’re basically, instead of figuring out every single rule, like instead of hard programming every single possible rule and situation that someone could run into, we’re just gonna throw a lot of data at it, and the computer will figure out what we want at the end. So you tell it, what you have at the beginning, you tell it what you want at the end, and then the computer figures out everything.

That means you have to have like massive amounts of data, like, Google amounts of data, to be able to do that really well. So, right now, that’s the approach that companies are taking. Like, collect all the data, you can do AI with it, and we’re off to the races.

The systems that we’re building are different because, they rely on different algorithms than ones that require huge amounts of data. So we’re thinking about, how can we empower people so that … You know, it’s a little bit harder, you have to spend some time, you can’t just throw data at it, but it allows people to have control over their own system.

I think that’s hugely important. Like, what if Alexa wasn’t just Alexa; Alexa was your Alexa? You could rename her, and train her, and things like that.

Charlie: So, to Randi’s point, I mean I really totally agree with everything that she’s saying. And it’s why I think it’s so important to bring researchers, and the general public, together. Literally everything that she just said, it’s what I’m hearing from people at these events. And the first thing that we’re hearing is that people, obviously they’re very curious, but they are also very much afraid. And I’m sometimes surprised at the level of fear that comes into the room. But then again, I’m not, because the reason, I think anyway, that people feel so much fear about AI, is that they aren’t talking about it enough, in a substantive way.

So they may talk about it in passing, they may hear about it, or read about it online. But when they come into our events, we force them to have these conversations with each other, looking each other in the eye, and to problem solve about this stuff. And at the end of the evening, what we always hear, from so many people, is that number one, they didn’t realize that, it wasn’t as bad as they thought it was.

So there’s this realization that once they begin to have the conversations, and begin to feel as if they can participate in the discussion, then they’re like, “Wow, this is actually pretty cool.” Because part of our goal is to help them to understand, to Randi’s point, that they can participate in developing these technologies. You don’t have to have an advanced degree in engineering, and everything. They’re shocked when I tell them that, or when they learn it for themselves.

And the second thing, to Randi’s point, is that, people are genuinely excited about the technologies, after they talk about it enough to allow their fears to dissipate. So, the immediate emotional reaction to AI, and to the fear of data, and it’s a substantive fear, because they’re being told by the media that they, you know, they should be afraid. And to some degree, obviously, there is a big concern about this. But once they are able to talk about this stuff, and to do the exercises, and to think through these things, and to ask questions of the guest speakers and researchers, they then start asking us, and emailing us, saying “What more can I do? I wanna do more. Where can I go to learn more about this?”

I mean we’ve had people literally up-skill, just go take courses in algorithms and everything. And so one of the things that we’ve done, which is a a part of Mission AI is, we now have an online learning series called, Ask the Experts, where we will have AI researchers, answer questions about things that people are hearing and seeing in the news. So we’ll pick a hot topic that everyone is talking about, or that’s getting a lot of play, and we will talk about that from the perspective of the researcher. And we’ll present the research that either supports the topic, or the particular angle that the reporter is taking, or refutes it.

So we actually have one coming up on algorithms, and on YouTube’s algorithm, it’s called, Reverse Engineering YouTube’s Algorithms, and it talks about how the algorithms are causing the YouTube creators a lot of anxiety, because they feel like the algorithm is being unfair to them, as they say it. And that’s a great entry point for people, for the general public, to have these discussions. So researchers will be answering questions that I think we all have.

Ariel: So, I’m hesitant to ask this next question, because I do, I like the idea of remaining hopeful about technology, and about AI. But, I am curious as to whether or not, you have found ethical issues regarding children’s interactions with artificial intelligence, or with Alexa, or any of the other AIs that they might be playing with?

Randi: Of course there are ethical issues. So, I guess to talk specifically about the research. I think there are ethical issues, but they raise more questions than answers. So, in the first study that we did, the Hey Google, is it Okay if I Eat You? We would see things like, some of the older children thought that Alexa was smarter than them, because it could answer all of their questions. But then conversely, the younger children would say, “Well it’s not smarter than me, because it doesn’t know what my favorite song is,” or it doesn’t know about, some TV show that they watch. And so, that led us to ask the question, well what does it mean when a child says that something is more intelligent than them?

And so we followed up with a study that was also recently published. So we had children compare the intelligence of a mouse, to the intelligence of a robot, to their own intelligence. And the way that we did this was, all three of them solved a maze. And then we listened to the way that children talked about each of the different things as they were solving the maze. So first of all, the children would say immediately, “The robot solved it the best. It’s the smartest.” But what we came to realize, was that, they just thought robots were smart in general. Like that was just the perception that they had, and it wasn’t actually based on the robot’s performance, because we had the mouse and the robot do the exact same performance. So they would say, “Well the mouse just smells the cheese, so that’s not smart. But the robot, was figuring it out, it had programming, so it’s very smart.”

And then when they looked at their own intelligence, they would be able to think about, and analyze their strategy. So they’re like, “Well I would just run over all the walls until I found the cheese,” or, “I would just, try not to look at places that I had been to before.” But they couldn’t talk about the robot in the same way. Like, they didn’t intellectually understand the programming, or the algorithm that was behind it, so they just sort of saw it as some mystical intelligence, and it just knew where the cheese was, and that’s why it was so fast. And they would be forgiving of the robot when it made mistakes.

And so, what I’m trying to say, is that, when children even say, “Oh that thing is so smart,” or when they say, “Oh I love my talking doll,” or, “Oh I love Alexa, she’s my best friend.” Even when they are mean to Alexa, and do rude things, a lot of parents look at that and they say, “My child is being brainwashed by the robots, and they’re gonna grow up and not be able to socialize, ’cause they’re so emotionally dependent on Alexa.”

But, our research, that one, and the one that we just did with the children’s conformity, what we’re finding is that, children behave very differently when they interact with humans, than when they interact with these toys. And, it’s like, even if they are so young, ’cause we work with children from four to ten years old. Even if they’re four years old, and they can’t verbalize how the robot is different, their behavior is different. So, at some subconscious level, they’re acknowledging that this thing is not a human, and therefore, there are different rules. The same way that they would if they were interacting with their doll, or if they were interacting with a puppy, or a piece of food.

So, people are very freaked out, because they’re like “Oh these things are so lifelike, and children don’t know the difference, and they’re gonna turn into robots themselves.” But, mostly what I’ve seen in my research is that we need to give children more credit, because they do know the differences between these things, and they’re very curious and explorative with them. Like, we asked a six year old girl, “What do you want to build a robot for, if you were to build one?” And she was like, “Well I want one to go to countries where there are poor people, and teach them all how to read and be their friend, because some people don’t have friends.” And I was just like, “That’s so beautiful. Why don’t you grow up and start working in our lab now?”

And it’s very different from the kind of conversation that we would have with an adult. The adult would be like, “I want a robot that can do all my work for me, or that can fetch me coffee or beer, or drive my car.” Children are on a very different level, and that’s because they’re like native to this technology. They’re growing up with it. They see it for what it is.

So, I would say, yes there are ethical issues around privacy, and yes we should keep monitoring the situation, but, it’s not what it looks like. That’s why it’s so important that we’re observing behavior, and asking questions, and studying it, and doing research that concretely can sort of say, “Yeah, you should probably be worried,” or, “No, there’s something more that’s going on here.”

Ariel: Awesome, thank you. I like the six year old’s response. I think everyone always thinks of children as being selfish too, and that’s a very non-selfish answer.

Randi: Yeah. Well some of them also wanted robots to go to school for them. So you know, they aren’t all angels, they’re very practical sometimes.

Ariel: I want to get back to one question that I didn’t get a chance to ask about Mission AI that I wanted to. And that’s sort of the idea of, what audiences you’re going to reach with it, how you’re choosing the locations, what your goals specifically are for these initial projects?

Charlie: That’s a question, by the way, that I have struggled with for quite some time. How do we go about doing this? It is herculean, I can’t reach everyone. You have to have some sort of focus, right? It actually took several months to come to the conclusion that we came to. And actually that only happened after research was, ironically, research was published last month in three states on how AI automation is going to impact specific jobs, or specific sectors in three states that are aggressively trying to sort of address this now and trying to educate their public now about what this stuff is.

And from what I’ve read, I think these three states, in their legislation, they feel like they’re not getting the support maybe, that they need or want, from their federal government. And so they figured, “Let’s figure this out now, before things get worse, for all we know. Before people’s concerns reach a boiling point, and we can’t then address it calmly, the way we should.” So those states are Arizona, Indiana, and northeast Ohio. And all three, this past month, released these reports. And I thought to myself, “Well, where’s the need the most?” Because there’s so many topics here that we can cover with regards to research in AI, and everything. And this is a constant dialogue that I’m having also with my advisors, and our advisors, and people in the industries. So the idea of AI and jobs, and the possibility of AI sort of decimating millions of jobs, we’ve heard numbers all over the place; realistically, yes, jobs will go away, and then new jobs will be created. Right? It’s what happens in between that is of concern to everyone. And so one of the things in making this decision that I’ve had to look at, is what I am hearing from the community? What are we hearing that is of the greatest concern from both the general public, from the executives, and just from in general, even in the press? What is the press covering exhaustively? What’s contributing to people’s fears?

And so we’ve found that it is without a doubt, the impact of AI on jobs. But to go into these communities, where number one, they don’t get these events the way we get them in New York and San Francisco. We were never meant to be a New York organization. It was always meant to launch here, and then go where the conversation is needed. I mean, we can say it’s needed everywhere, but there are communities across this country where they really need to have this information, and this community, and in their own way. I’m in no way thinking that we can take what we do here in New York, and retrofit for every other community, and every other state. So this will be very much a learning process for us.

As we go into these different states, and we take the research that they have done on what they think the impact if AI and automation will be on specific jobs? We will be doing events in their communities, and gathering our own research, and trying to figure out the questions that we should be asking of people, at these events that will offer insight for them, for the researchers, and for the legislators.

The other thing that I would say, is that we want to begin to give people actionable feedback on what they can do. Because people are right now, very, very much feeling like, “There’s gotta be something else that I can do.” And understand that there’s a lot of pressure.

As you know, we’re at an all time low, with regards to employment, unemployment. And the concern of the executive today is that, “Oh my God, we’re going to lose jobs.” It’s, “Oh my God, how do I fill these jobs?” And so, they have a completely different mindset about this. And their goal is, “How do we up skill people? How do we prepare them for the jobs that are there now, and the ones that are to come?”

So, the research will also hopefully touch on that as well, because that is huge. And I don’t think that people are seeing the opportunities that are available to them in these spaces, and in adjacent spaces to develop the technologies. Or to help define what they might be, or to contribute to the legislative discussion. That’s another huge thing that we are seeing as a need.                    

Again, we want this to fill a need. I don’t want to in any way, dictate something that’s not going to be of use to people. And to that end, I welcome feedback. This is an open dialogue that we’re having with the community, and with businesses, and with of course, our awesome advisors, and the researchers. This is all the more of the reason too, why it’s important to hear from the young researchers. I am adamant on bringing in young researchers. I think they are chomping at the bit, to sort of share their ideas, and to get out there some of the things that they may not be able to share.

That’s pretty much the crux of it, is to meet the demand, and to help people to see how they can participate in this, and why the research is important. We want to emphasize that.

Ariel: A quick follow up for Randi, and that is, as an AI researcher what do you hope to get out of these outreach efforts?

Randi: As an AI researcher, we often do things that are public facing. So whether it be blog posts, or videos, or actually recruiting the public to do studies. Like recently we had a big study that happened in the lab, not in my group, but it was around the ethics of self driving cars. So, for me, it’s just going out and making sure that there are more people a part of the conversation than typically would be. Because, at the end of the day, I am based in MIT. So the people who I am studying are a select group of people. And I very much want to use this as a way to get out of that bubble, and to reach more people, hear their comments, hear their feedback, and design for them.

One of the big things I’ve been doing is trying to go, literally out of this country, to places where everyone doesn’t have a computer in their home, and think about, you know “Okay, so where does AI education, how does it make sense in this context?” And that’s what I think a lot of researchers want. ‘Cause this is a huge problem, and we can only see little bits of it as research assistants. So we want to be able to see more and more.

Charlie: I know you guys at the The Future of Life Institute have your annual conference on AI, and you produced the document a year ago, with 100 researchers or scientists on the Asilomar Principles.

Ariel: Yup.

Charlie: We took that document, that was one of the documents that I looked at, and I thought, “Wow this is fascinating.” So these are 23 principles, that some of the most brilliant minds in AI are saying that we should consider, when developing these technologies. Now, I know it wasn’t perfect, but I was also taken aback by the fact that the media was not covering it. And they did cover it, of course they announced it, it’s big. But there wasn’t any real critical discussion about it, and I was alarmed at that. ‘Cause I said, “This should be discussed exhaustively, or at least it should be sort of the impetus for a discussion, and there was none.”

So I decided to bring that discussion into the Tech 2025 community, and we had Dr. Seth Baum who is the executive director at the Global Catastrophic Risk Institute come in, and present what these 23 principles are, his feedback on them, and he did a quick presentation. It was great. And then we turned over to the audience, two problems, and one was, what is the one thing in this document that you think is so problematic that it should not be there? And number two, what should be there in its place?

It turned out to be a very contentious, really emotional discussion. And then when they came up with their answers, we were shocked at the ideas that they came up with, and where they felt the document was the most problematic. The group that came up with the solution that won the evening, ’cause sometimes we give out prizes depending on what it is, or we’ll ask the guest speaker to pick the solution that resonated the most with him. The one that resonated the most with Seth was a solution that Seth had never even considered, and he does this for a living, right?

So we hear that a lot from researchers, to Randi’s point. We actually hear from researchers who say, “My God, they’re people who are coming up with ideas, and I haven’t even considered.” And then on top of that, when we ask people, well what do you think about this document? Now this is no offense to the people who came up with this document, but they were not happy about it. And they all expressed that they were really concerned about the idea that anyone would be dictating what the morals or ethics of AI, or algorithms should be. Because the logical question is, whose morals, whose ethics, who dictates it, who polices it? That’s a problem.

And we don’t look at that as bad. I think that’s great, because that is where the dialogue between researchers, and the community, and the general public, that’s where to me, to becomes a beautiful thing.

Ariel: It does seem a little bit unfortunate since the goal of the document was in part, to acknowledge that you can’t just have one group of people saying, “These are what morals should be.” I’m concerned that people didn’t like it because, it was, sounds like it was misinterpreted, I guess. But that happens. So I’m gonna ask one last round up question to both of you. As you look towards a future with artificial intelligence, what are you most worried about, and what are you most excited about?

Randi: So, I’m most worried that a lot of people won’t have access to the benefits of AI until, like 30 years from now. And I think, we’re getting to the point, especially in business where AI can make a huge difference, like a huge difference, in terms of what you’re able to accomplish. And I’m afraid for that inequality to propagate in the wrong ways.

I’m most excited about the fact that, you know, at the same time as progress towards technologies that may broaden inequalities, there’s this huge push right now, for AI education. So literally, I’m in conversations with people in China, because China just made a mandate that everyone has AI education. Which is amazing. And in the United States, I think all 50 states just passed a CS requirement, and as a result, IEEE decided to start an AI K-12 initiative.

So, you know, as one of the first people in this space about AI education, I’m excited that it’s gaining traction, and I’m excited to see, you know, what we’re gonna do in the next five, ten years, that could really change what the landscape looks like right now.

Charlie: My concerns are pretty much the same with regards to who will be leveraging the technologies the most, and who will have control over them, and will the algorithms actually be biased or not. But I mean, right now, it’s unfortunate, but we have every reason to believe that the course on which we’re going, especially when we look at what’s happening now, and people realizing what’s happening with their data, my concern is that if we don’t reverse course on that, meaning become far more conscientious of what we’re doing with our own data, and how to engage companies, and how to help consumers to engage companies in discussions on what they’re doing, how they’re doing it, that we may not be able to sort of, not hit that brick wall. And I see it as a brick wall. Because if we get to the point where it is that only a few companies control all the algorithms of the world, or whatever you wanna say, I just think there’s no coming back from that. And that’s really a real fear that I have.

In terms of the hope, I think the thing that gives me hope, what keeps me going, and keeps me investing in this, and growing the community, is that, I talk to people and I see that they actually are hopeful. That they actually see that there is a possibility, a very real possibility, even though they are afraid… When people take time out of busy schedules to come and sit in a room, and listen to each other, and talk to each other about this stuff, that is the best indication that those people are hopeful about the future, and about their ability to participate in it. And so based on what I’m hearing from them, I am extremely hopeful, and I believe that there is a very huge opportunity here to do some incredible things, including helping people to see how they can reinvent the world.

We are being asked to redefine our reality, and I think some people will get that, some people won’t. But the fact that that’s being presented to us through these technologies, among other things, is to me, just exciting. It keeps me going.

Ariel: All right. Well, thank you both so much for joining us today.

Charlie: Thank you.

Randi: Thank you for having us.

Ariel: As I mentioned at the beginning, if you’ve been enjoying the podcasts, please take a moment to like them, share them, follow us on whatever platform you’re listening to us on. And, I will be back again next month, with a new pair of experts.

[end of recorded material]

 

 

Podcast: Astronomical Future Suffering and Superintelligence with Kaj Sotala

In a classic taxonomy of risks developed by Nick Bostrom (seen below), existential risks are characterized as risks which are both terminal in severity and transgenerational in scope. If we were to maintain the scope of a risk as transgenerational and increase its severity past terminal, what would such a risk look like? What would it mean for a risk to be transgenerational in scope and hellish in severity?

Astronomical Future Suffering and Superintelligence is the second podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Kaj Sotala, an associate researcher at the Foundational Research Institute. He has previously worked for the Machine Intelligence Research Institute, and has publications on AI safety, AI timeline forecasting, and consciousness research.

Topics discussed in this episode include:

  • The definition of and a taxonomy of suffering risks
  • How superintelligence has special leverage for generating or mitigating suffering risks
  • How different moral systems view suffering risks
  • What is possible of minds in general and how this plays into suffering risks
  • The probability of suffering risks
  • What we can do to mitigate suffering risks
In this interview we discuss ideas contained in a paper by Kaj Sotala and Lukas Gloor. You can find the paper here: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering.  You can hear about this paper in the podcast above or read the transcript below.

 

Lucas: Hi, everyone. Welcome back to the AI Alignment Podcast of the Future of Life Institute. If you are new or just tuning in, this is a new series at FLI where we’ll be speaking with a wide variety of technical and nontechnical domain experts regarding the AI alignment problem, also known as the value alignment problem. If you’re interested in AI alignment, the Future of Life Institute, existential risks, and similar topics in general, please remember to like and subscribe to us on SoundCloud or your preferred listening platform.

Today, we’ll be speaking with Kaj Sotala. Kaj is an associate researcher at the Foundational Research Institute. He has previously worked for the Machine Intelligence Research Institute, and has publications in the areas of AI safety, AI timeline forecasting, and consciousness research. Today, we speak about suffering risks, a class of risks most likely brought about by new technologies, like powerful AI systems that could potentially lead to astronomical amounts of future suffering through accident or technical oversight. In general, we’re still working out some minor kinks with our audio recording. The audio here is not perfect, but does improve shortly into the episode. Apologies for any parts that are less than ideal. With that, I give you Kaj.

Lucas: Thanks so much for coming on the podcast, Kaj. It’s super great to have you here.

Kaj: Thanks. Glad to be here.

Lucas: Just to jump right into this, could you explain a little bit more about your background and how you became interested in suffering risks, and what you’re up to at the Foundational Research Institute?

Kaj: Right. I became interested in all of this stuff about AI and existential risks way back in high school when I was surfing the internet until I somehow ran across the Wikipedia article for the technological singularity. After that, I ended up reading Eliezer Yudkowksy’s writings, and writings by other people. At one point, I worked for the Machine Intelligence Research Institute, immersed in doing strategic research, did some papers on predicting AI that makes a lot of sense together with Stuart Armstrong of the Future of Humanity Institute. Eventually, MIRI’s focus on research shifted more into more technical and mathematical research, which wasn’t exactly my strength, and at that point we parted ways and I went back to finish my master’s degree in computer science. Then after I graduated, I ended up being contacted by the Foundational Research Institute, who had noticed my writings on these topics.

Lucas: Could you just unpack a little bit more about what the Foundational Research Institute is trying to do, or how they exist in the effective altruism space, and what the mission is and how they’re differentiated from other organizations?

Kaj: They are the research arm of the Effective Altruism Foundation in the German-speaking area. The Foundational Research Institute’s official tagline is, “We explain how humanity can best reduce suffering.” The general idea is that a lot of people have this intuition that if you are trying to improve the world, then there is a special significance on reducing suffering, and especially about outcomes involving extreme suffering have some particular moral priority, that we should be looking at how to prevent those. In general, the FRI has been looking at things like the long-term future and how to best reduce suffering at long-term scales, including things like AI and emerging technologies in general.

Lucas: Right, cool. At least my understanding is, and you can correct me on this, is that the way that FRI sort of leverages what it does is that … Within the effective altruism community, suffering risks are very large in scope, but it’s also a topic which is very neglected, but also low in probability. Has FRI really taken this up due to that framing, due to its neglectedness within the effective altruism community?

Kaj: I wouldn’t say that the decision to take it up was necessarily an explicit result of looking at those considerations, but in a sense, the neglectedness thing is definitely a factor, in that basically no one else seems to be looking at suffering risks. So far, most of the discussion about risks from AI and that kind of thing has been focused on risks of extinction, and there have been people within FRI who feel that risks of extreme suffering might actually be very plausible, and may be even more probable than risks of extinction. But of course, that depends on a lot of assumptions.

Lucas: Okay. I guess just to move foreward here and jump into it, given FRI’s mission and what you guys are all about, what is a suffering risk, and how has this led you to this paper?

Kaj: The definition that we have for suffering risks is that a suffering risk is a risk where an adverse outcome would bring about severe suffering on an astronomical scale, so vastly exceeding all suffering that has existed on earth so far. The general thought here is that if we look at the history of earth, then we can probably all agree that there have been a lot of really horrible events that have happened, and enormous amounts of suffering. If you look at something like the Holocaust or various other terrible events that have happened throughout history, there is an intuition that we should make certain that nothing this bad happens ever again. But then if we start looking at what might happen if humanity, for instance, colonizes space one day, then if current trends might continue, then you might think that there is no reason why such terrible events wouldn’t just repeat themselves over and over again as we expand into space.

That’s sort of one of the motivations here. The paper we wrote is specifically focused on the relation between suffering risks and superintelligence, because like I mentioned, there has been a lot of discussion about superintelligence possibly causing extinction, but there might also be ways by which superintelligence might either cause suffering risks, for instance in the form of some sort of uncontrolled AI, or alternatively, if we could develop some kind of AI that was aligned with humanity’s values, then that AI might actually be able to prevent all of those suffering risks from ever being realized.

Lucas: Right. I guess just, if we’re really coming at this from a view of suffering-focused ethics, where we’re really committed to mitigating suffering, even if we just view sort of the history of suffering and take a step back, like, for 500 million years, evolution had to play out to reach human civilization, and even just in there, there’s just a massive amount of suffering, in animals evolving and playing out and having to fight and die and suffer in the ancestral environment. Then one day we get to humans, and in the evolution of life on earth, we create civilization and technologies. In seems, and you give some different sorts of plausible reasons why, that either for ignorance or efficiency or, maybe less likely, malevolence, we use these technologies to get things that we want, and these technologies seem to create tons of suffering.

In our history so far, we’ve had things … Like you mentioned, the invention of the ship has helped lead to slavery, which created an immense amount of suffering. Modern industry has led to factory farming, which has created an immense amount of suffering. As we move foreward and we create artificial intelligence systems and potentially even one day superintelligence, we’re really able to mold the world more so into a more extreme state, where we’re able to optimize it much harder. In that optimization process, it seems the core of the problem lies, is that when you’re taking things to the next level and really changing the fabric of everything in a very deep and real way, that suffering can really come about. The core of the problem seems that, when technology is used to fix certain sorts of problems, like that we want more meat, or that we need more human labor for agriculture and stuff, that in optimizing for those things we just create immense amounts of suffering. Does that seem to be the case?

Kaj: Yeah. That sounds like a reasonable characterization.

Lucas: Superintelligence seems to be one of these technologies which is particularly in a good position to be worried it creating suffering risks. What are the characteristics, properties, and attributes of computing and artificial intelligence and artificial superintelligence that gives it this special leverage in being risky for creating suffering risks?

Kaj: There’s obviously the thing about superintelligence potentially, as you mentioned, being able to really reshape the world at a massive scale. But if we compare what is the difference between a superintelligence that is capable of reshaping the world at a massive scale versus humans doing the same using technology … A few specific scenarios that we have been looking at in the paper is, for instance, if we compare to a human civilization, then a major force in human civilizations is that most humans are relatively empathic, and while we can see that humans are willing to cause others serious suffering if that is the only, or maybe even the easiest way of achieving their goals, a lot of humans still want to avoid unnecessary suffering. For instance, currently we see factory farming, but we also see a lot of humans being concerned about factory farming practices, a lot of people working really hard to reform things so that there would be less animal suffering.

But if we look at, then, artificial intelligence, which was running things, then if it is not properly aligned with our values, and in particular if it does not have something that would correspond to a sense of empathy, and it’s just actually just doing whatever things maximize its goals, and its goals do not include prevention of suffering, then it might do things like building some kind of worker robots or subroutines that are optimized for achieving whatever goals it has. But if it turns out that the most effective way of making them do things is to build them in such a way that they suffer, then in that case there might be an enormous amount of suffering agents with no kind of force that was trying to prevent their existence or trying to reduce the amount of suffering in the world.

Another scenario is the possibility of mind-crime. This is discussed in Bostrom’s Superintelligence briefly. The main idea here is that if the superintelligence creates simulations of sentient minds, for instance for scientific purposes or the purposes of maybe blackmailing some other agent in the world by torturing a lot of minds in those simulations, AI might create simulations of human beings that were detailed enough to be conscious. Then you mentioned earlier the thing about evolution already have created a lot of suffering. If the AI were similarly to simulate evolution or simulate human societies, again without caring about the amount of suffering within those simulations, then that could again cause vast amounts of suffering.

Lucas: I definitely want to dive into all of these specific points with you as they come up later in the paper, and we can really get into and explore them. But so, really just to take a step back and understand what superintelligence is and the different sorts of attributes that it has, and how it’s different than human beings and how it can lead to suffering risk. For example, there seems to be multiple aspects here where we have to understand superintelligence as a general intelligence running at digital timescales rather than biological timescales.

It also has the ability to copy itself, and rapidly write and deploy new software. Human beings have to spend a lot of time, like, learning and conditioning themselves to change the software on their brains, but due to the properties and features of computers and machine intelligence, it seems like copies could be made for very, very cheap, it could be done very quickly, they would be running at digital timescales rather than biological timescales.

Then it seems there’s the whole question about value-aligning the actions and goals of this software and these systems and this intelligence, and how in the value alignment process there might be technical issues where, due to difficulties in AI safety and value alignment efforts, we’re not able to specify or really capture what we value. That might lead to scenarios like you were talking about, where there would be something like mind-crime, or suffering subroutines which would exist due to their functional usefulness or epistemic usefulness. Is there anything else there that you would like to add and unpack about why superintelligence specifically has a lot of leverage for leading to suffering risks?

Kaj: Yeah. I think you covered most of the things. I think the thing that they are all leading to that I just want to specifically highlight is the possibility of the superintelligence actually establishing what Nick Bostrom calls a singleton, basically establishing itself as a single leading force that basically controls the world. I guess in one sense you could talk about singletons in general and their impact on suffering risks, rather than superintelligence specifically, but at this time it does not seem very plausible, or at least I cannot foresee, very many other paths to a singleton other than superintelligence. That was a part of why we were focusing on superintelligence in particular.

Lucas: Okay, cool. Just to get back to the overall structure of your paper, what are the conditions here that you cover that must be met in order for s-risks to merit our attention? Why should we care about s-risks? Then what are all the different sorts of arguments that you’re making and covering in this paper?

Kaj: Well, basically, in order for any risk, suffering risks included, to merit work on them, they should meet three conditions. The first is that the outcome of the risk should be sufficiently severe to actually merit attention. Second, the risk must have some reasonable probability of actually being realized. Third, there must be some way for risk avoidance work to actually reduce either the probability or the severity of the adverse outcome. If something is going to happen for certain and it’s very bad, then if we cannot influence it, then obviously we cannot influence it, and there’s no point in working on it. Similarly, if some risk is very implausible, then it might not be the best use of resources. Also, if it’s very probable but wouldn’t cause a lot of damage, then it might be better to focus on risks which would actually cause more damage.

Lucas: Right. I guess just some specific examples here real quick. The differences here are essentially between, like, the death of the universe, if we couldn’t do anything about it, we would just kind of have to deal with that, then sort of like a Pascal mugging situation, where a stranger just walks up to you on the street and says, “Give me a million dollars or I will simulate 10 to the 40 conscious minds suffering until the universe dies.” The likelihood of that is just so low that you wouldn’t have to deal with it. Then it seems like the last scenario would be, like, you know that you’re going to lose a hair next week, and that’s just sort of like an imperceptible risk that doesn’t matter, but that has very high probability. Then getting into the meat of the paper, what are the arguments here that you make regarding suffering risks? Does suffering risk meet these criteria for why it merits attention?

Kaj: Basically, the paper is roughly structured around those three criteria that we just discussed. We basically start by talking about what the s-risks are, and then we seek to establish that if they were realized, they would indeed be bad enough to merit our attention. In particular, we argue that many value systems would consider some classes of suffering risks to be as bad or worse than extinction. Also, we cover some suffering risks which are somewhat less severe that extinction, but still, according to many value systems, very bad.

Then we move on to look at the probability of the suffering risks to see whether it is actually plausible that they will be realized. We survey what might happen if nobody builds a superintelligence, or maybe more specifically, if there is no singleton that could prevent suffering risks that might be realized sort of naturally, in the absence of a singleton.

We also look at, okay, if we do have a superintelligence or a singleton, what suffering risks might that cause? Finally, we look at the last question, of the tractability. Can we actually do anything about these suffering risks? There we also have several suggestions of what we think would be the kind of work that would actually be useful in either reducing the risk or the severity of suffering risks.

Lucas: Awesome. Let’s go ahead and move sequentially through these arguments and points which you develop in the paper. Let’s start off here by just trying to understand suffering risk just a little bit more. Can you unpack the taxonomy of suffering risks that you develop here?

Kaj: Yes. We’ve got three possible outcomes of suffering risks. Technically, a risk is something that may or may not happen, so three specific outcomes of what might happen. The three outcomes, I’ll just briefly give their names and then unpack them. We’ve got what we call astronomical suffering outcomes, net suffering outcomes, and pan-generational net suffering outcomes.

I’ll start with the net suffering outcome. Here, the idea is that if we are talking about a risk which might be of a comparable severity as risks of extinction, then one way you could get that is if, for instance, we look from the viewpoint of something like classical utilitarianism. You have three sorts of people. You have people who have a predominantly happy life, you have people who never exist or have a neutral life, and you have people who have a predominantly unhappy life. As a simplified moral calculus, you just assign the people with happy lives a plus-one, and you assign the people with unhappy lives a minus-one. Then according to this very simplified moral system, then you would see that if we have more unhappy lives than there are happy lives, then technically this would be worse than there not existing any lives at all.

That is what we call a net suffering outcome. In other words, at some point in time there are more people experiencing lives that are more unhappy than happy, and there are people experiencing lives which are the opposite. Now, if you have a world where most people are unhappy, then if you’re optimistic you might think that, okay, it is bad, but it is not necessarily worse than extinction, because if you look ahead in time, then maybe the world will go on and conditions will improve, and then after a while most people actually live happy lives, so maybe things will get better. We define an alternative scenario in which we just assume that things actually won’t get better, and if you sum over all of the lives that will exist throughout history, most of them still end up being unhappy. Then that would be what we call a pan-generational net suffering outcome. When summed over all the people that will ever live, there are more people experiencing lives filled predominantly with suffering than there are people experiencing lives filled predominantly with happiness.

You could also have what we call astronomical suffering outcomes, which is just that at some point in time there’s some fraction of the population which experiences terrible suffering, and the amount of suffering here is enough to constitute an astronomical amount that overcomes all the suffering in earth’s history. Here we are not making the assumption that the world would be mainly filled with these kinds of people. Maybe you have one galaxy worth of people in terrible pain, and 500 galaxy’s worth of happy people. According to some value systems, that would not be worse than extinction, but probably all value systems would still agree that even if this wasn’t worse than extinction, it would still be something that would be very much worth avoiding. Those are the three outcomes that we discuss here.

Lucas: Traditionally, the sort of far-future concerned community has mainly only been thinking about existential risks. Do you view this taxonomy and suffering risks in general as being a subset of existential risks? Or how do you view it in relation to what we traditionally view as existential risks?

Kaj: If we look at Bostrom’s original definition for an existential risk, the definition was that it is a risk where an adverse outcome would either annihilate earth-originating intelligent life, or permanently and drastically curtail its potential. Here it’s a little vague on how exactly you should interpret phrases like “permanently and drastically curtain our potential.” You could take the view that suffering risks are a subset of existential risks if you view our potential as being something like the realization of a civilization full of happy people, where nobody ever needs to suffer. In that sense, it would be a subset of existential risks.

It is most obvious with the net suffering outcomes. It seems pretty plausible that most people experiencing suffering would not be the realization of our full potential. Then if you look at something like near-astronomical suffering outcomes, where you might only have a small fraction of the population experiencing suffering, then that, depending on exactly how large the fraction, then you might maybe not count it as a subset of existential risks, and maybe something more comparable to catastrophic risks, which have usually been defined on the order of a few million people dying. Obviously, the astronomical suffering outcomes are worse than catastrophic risks, but maybe something more comparable to catastrophic risks than existential risks.

Lucas: Given the taxonomy that you’ve gone ahead and unpacked, what are the different sorts of perspectives that different value systems on earth have of suffering risks? Just unpack a little bit what the general value systems are that human beings are running in their brains.

Kaj: If we look at ethics, philosophers have proposed a variety of different value systems and ethical theories. If we just look at the few of the main ones, then something like classical utilitarianism, where you basically view worlds as good based on what is the balance of happiness minus suffering. Then if you look at what would be the view of classical utilitarianism on suffering risks, classical utilitarianism would find these worst kinds of outcomes, net suffering outcomes as worse than extinction. But they might find astronomical suffering outcomes as an acceptable cost of having even more happy people. They might look at that, one galaxy full of suffering people, and think that, “Well, we have 200 galaxies full of happy people, so it’s not optimal to have those suffering people, but we have even more happy people, so that’s okay.

A lot of moral theories are not necessarily explicitly utilitarian, or they might have a lot of different components and so on, but a lot of them still include some kind of aggregative component, meaning that they still have some element of, for instance, looking at suffering and saying that other things being equal, it’s worse to have more suffering. This would, again, find suffering risks something to avoid, depending on exactly how they weight things and how they value things. Then it will depend on those specific weightings, on whether they find suffering risks as worse than extinction or not.

Also worth noting that even if the theories wouldn’t necessarily talk about suffering exactly, they might still talk about something like preference satisfaction, whether people are having their preferences satisfied, some broader notion of human flourishing, and so on. In scenarios where there is a lot of suffering, probably a lot of these things that these theories consider valuable would be missing. For instance, if there is a lot of suffering and people cannot escape that suffering, then probably there are lots of people whose preferences are not being satisfied, if they would prefer not to suffer and they would prefer to escape the suffering.

Then there are little kinds of rights-based theories, which don’t necessarily have this aggregative component directly, but are more focused on thinking in terms of rights, which might not be summed together directly, but depending on how these theories would frame rights … For instance, some theories might hold that people or animals have a right to avoid unnecessary suffering, or these kinds of theories might consider suffering indirectly bad if the suffering was created by some condition which violated people’s rights. Again, for instance, if people have a right for meaningful autonomy and they are in circumstances in which they cannot escape their suffering, then you might hold that their right for a meaningful autonomy has been violated.

A bunch of moral intuitions, which might fit a number of moral theories and which might particularly prioritize the prevention of suffering in particular. I mentioned that classical utilitarianism basically weights extreme happiness and extreme suffering the same, so it will be willing to accept a large amount of suffering if you could produce a lot of, even more, happiness that way. But for instance, there have been moral theories like prioritarianism proposed, which might make a different judgment.

Prioritarianism is the position that the worse off an individual is, the more morally valuable it is to make that individual better off. If one person is living in hellish conditions and another is well-off, then if you could sort of give either one of them five points of extra happiness, then it would be much more morally pressing to help the person who was in more pain. This seems like an intuition that I think a lot of people share, and if you had something like some kind of an astronomical prioritarianism that considered all across the universe and prioritized improving the worst ones off, then that might push in the direction of mainly improving the lives of those that would be worst off and avoiding suffering risks.

Then there are a few other sort of suffering-focused intuitions. A lot of moral intuitions have this intuition that it’s more important to make people happy than it is to create new happy people. This one is rather controversial, and a lot of EA circles seem to reject this intuition. It’s true that there are some strong arguments against it, but at the other hand, rejecting it also seems to lead to some paradoxical conclusions. Here, the idea behind this intuition is that the most important thing is helping existing people. If we think about, for instance, colonizing the universe, someone might argue that if we colonized the universe, then that will create lots of new lives who will be happy, and that will be a good thing, even if this comes at the cost of create a vast number of unhappy lives as well. But if you take the view that the important thing is just making existing lives happy and we don’t have any special obligation to create new lives that are happy, then it also becomes questionable whether it is worth the risk of creating a lot of suffering for the sake of just creating happy people.

Also, there is an intuition of, torture-level suffering cannot be counterbalanced. Again, there are a bunch of good arguments against this one. There’s a nice article by Toby Ord called “Why I Am Not a Negative Utilitarian,” which argues against versions of this thesis. But at the same time, there does seem to be something that has a lot of intuitive weight for a lot of people. Here the idea is that there are some kinds of suffering so intense and immense that you cannot really justify that with any amount of happiness. David Pearce has expressed this well in his quote where he says, “No amount of happiness or fun enjoyed by some organisms can notionally justify the indescribable horrors of Auschwitz.” Here we must think that, okay, if we go out and colonize the universe, and then we know that colonizing the universe is going to create some equivalent event as what went on in Auschwitz and at other genocides across the world, then no amount of happiness that we create that way will be worth that terrible terror that would probably also be created if there was nothing to stop it.

Finally, there’s an intuition of happiness being the absence of suffering, which is the sort of an intuition that is present in Epicureanism and some non-Western traditions, such as Buddhism, where happiness is thought as being the absence of suffering. The idea is that when we are not experiencing any pleasure, we begin to crave pleasure, and it is this craving that constitutes suffering. Under this view, happiness does not have intrinsic value, but rather it has instrumental value in taking our focus away from suffering and helping us avoid suffering that way. Under that view, creating additional happiness doesn’t have any intrinsic value if that creation does not help us avoid suffering.

I mentioned here a few of these suffering-focused intuitions. Now, in presenting these, my intent is not to say that there would not also exist counter-intuitions. There are a lot of reasonable people who disagree with these intuitions. But the general point that I’m just expressing is that regardless of which specific moral system we are talking about, these are the kinds of intuitions that a lot of people find plausible, and which could reasonably fit in a lot of different moral theories and value systems, and probably a lot of value systems contain some version of these.

Lucas: Right. It seems like the general idea is just that whether you’re committed to some sort of form of consequentialism or deontology or virtue ethics, or perhaps something that’s even potentially theological, there are lots of aggregative or non-aggregative, or virtue-based or rights-based reasons for why we should care about suffering risks. Now, it seems to me that potentially here probably what’s most important, or where these different normative and meta-ethical views matter in their differences, is in how you might proceed forward and engage in AI research and in deploying and instantiating AGI and superintelligence, given your commitment more or less to a view which takes the aggregate, versus one which does not. Like you said, if you take a classical utilitarian view, then one might be more biased towards risking suffering risks given that there might still be some high probability of there being many galaxies which end up having very net positive experiences, and then maybe one where there might be some astronomical suffering. How do you view the importance of resolving meta-ethical and normative ethical disputes in order to figure out how to move foreward in mitigating suffering risks?

Kaj: The general problem here, I guess you might say, is that there exist trade-offs between suffering risks and existential risks. If we had a scenario where some advanced general technology or something different might constitute an existential risk to the world, then someone might think about trying to solve that with AGI, which might have some probability of not actually working properly and not actually being value-aligned. But someone might think that, “Well, if we do not activate this AGI, then we are all going to die anyway, because of this other existential risk, so might as well activate it.” But then if there is a sizable probability of the AGI actually causing a suffering risk, as opposed to just an existential risk, then that might be a bad idea. As you mentioned, the different value systems will make different evaluations about these trade-offs.

In general, I’m personally pretty skeptical about actually resolving ethics, or solving it in a way that would be satisfactory to everyone. I expect there a lot of the differences between meta-ethical views could just be based on moral intuitions that may come down to factors like genetics or the environment where you grew up, or whatever, and which are not actually very factual in nature. Someone might just think that some specific, for instance, suffering-focused intuition was very important, and someone else might think that actually that intuition makes no sense at all.

The general approach, I would hope, that people take is that if we have decisions where we have to choose between an increased risk of extinction or an increased risk of astronomical suffering, then it would be better if people from all ethical and value systems would together try to cooperate. Rather than risk conflict between value systems, a better alternative would be to attempt to identify interventions which did not involve trading off one risk for another. If there were interventions that reduced the risk of extinction without increasing the risk of astronomical suffering, or decreased the risk of astronomical suffering without increasing the risk of extinction, or decreased both risks, then it would be in everyone’s interest if we could agree, okay, whatever our moral differences, let’s just jointly focus on these classes of interventions that actually seem to be a net positive in at least one person’s value system.

Lucas: Like you identify in the paper, it seems like the hard part is when you have trade-offs.

Kaj: Yes.

Lucas: Given this, given that most value systems should care about suffering risks, now that we’ve established the taxonomy and understanding of what suffering risks are, discuss a little bit about how likely suffering risks are relative to existential risks and other sorts of risks that we encounter.

Kaj: As I mentioned earlier, these depend somewhat on, are we assuming a superintelligence or a singleton or not? Just briefly looking at the case where we do not assume a superintelligence or singleton, we can see that in history so far there does not seem to be any consistent trend towards reduced suffering, if you look at a global scale. For instance, the advances in seafaring enabled the transatlantic slave trade, and similarly, advances in factory farming practices have enabled large amounts of animals being kept in terrible conditions. You might plausibly think that the net balance of suffering and happiness caused by the human species right now was actually negative due to all of the factory farmed animals, although it is another controversial point. Generally, you can see that if we just extrapolated the trends so far to the future, then we might see that, okay, there isn’t any obvious sign of there being less suffering in the world as technology develops, so it seems like a reasonable assumption, although not the only possible assumption, that as technology advances, it will also continue to enable more suffering, and future civilizations might also have large amounts of suffering.

If we look at the outcomes where we do have a superintelligence or a singleton running the world, here things get, if possible, even more speculative. In the beginning, we can at least think of some plausible-seeming scenarios in which a superintelligence might end up causing large amounts of suffering, such as building suffering subroutines. It might create mind-crime. It might also try to create some kind of optimal human society, but some sort of the value learning or value extrapolation process might be what some people might consider incorrect in such a way that the resulting society would also have enormous amounts of suffering. While it’s impossible to really give any probability estimates on exactly how plausible is a suffering risk, and depends on a lot of your assumptions, it does at least seem like a plausible thing to happen with a reasonable probability.

Lucas: Right. It seems that just technology, like intrinsic to what technology is, is it’s giving you more leverage and control over manipulating and shaping the world. As you gain more causal efficacy over the world and other sentient beings, it seems kind of obviously that yeah, you also gain more ability to cause suffering, because your causal efficacy is increasing. It seems very important here to isolate the causal factors in people and just in the universe in general, which lead to this great amount of suffering. Technology is a tool, a powerful tool, and it keeps getting more powerful. The hand by which the tool is guided is ethics.

But it doesn’t seem that historically, and in the case of superintelligence as well, that primarily the vast amounts of suffering that have been caused are because of failures in ethics. I mean, surely there has been large failures in ethics, but evolution is just an optimization process which leads to vast amounts of suffering. There could be similar evolutionary dynamics in superintelligence which lead to great amounts of suffering. It seems like issues with factory farming and slavery are not due to some sort of intrinsic malevolence in people, but rather it seems sort of like an ethical blind spot and apathy, and also a solution to an optimization problem where we get meat more efficiently, and we get human labor more efficiently. It seems like we can apply these lessons to superintelligence. It seems like it’s not likely that superintelligence will produce astronomical amounts of suffering due to malevolence.

Kaj: Right.

Lucas: Or like, intentional malevolence. It seems there might be, like, a value alignment problem or mis-specification, or just generally in optimizing that there might be certain things, like mind-crime or suffering subroutines, which are functionally very useful or epistemically very useful, and in their efficiency for making manifest other goals, they perhaps astronomically violate other values which might be more foundational, such as the mitigation of suffering and the promotion of wellbeing across all sentient beings. Does that make sense?

Kaj: Yeah. I think one way I might phrase that is that we should expect there to be less suffering if the incentives created by the future world for whatever agents are acting there happen to align with doing the kinds of things that cause less suffering. And vice versa, if the incentives just happen to align with actions that cause agents great personal benefit, or at least the agents that are in power great personal benefit while suffering actually being the inevitable consequence of following those incentives, then you would expect to see a lot of suffering. As you mentioned, with evolution, there isn’t even an actual agent to speak of, but just sort of in free-running optimization process, and the solutions which that optimization process has happened to hit on have just happened to involve large amounts of suffering. There is a major risk of a lot of suffering being created by the kinds of processes that are actually not actively malevolent, and some of which might actually care about preventing suffering, but then just the incentives are such that they end up creating suffering anyway.

Lucas: Yeah. I guess what I find very fascinating and even scary here is that there are open questions regarding the philosophy of mind and computation and intelligence, where we can understand pain and anger and pleasure and happiness and all of these hedonic valences within consciousness as, at very minimum, being correlated with cognitive states which are functionally useful. These hedonic valences are informationally sensitive, and so they give us information about the world, and they sort of provide a functional use. You discuss here how it seems like anger and pain and suffering and happiness and joy, all of these seem to be functional attributes of the mind that evolution has optimized for, and they may or may not be the ultimate solution or the best solution, but they are good solutions to avoiding things which may or may not be bad for us, and promoting behaviors which lead to social cohesion and group coordination.

I think there’s a really deep and fundamental question here about whether or not minds in principle can be created to have informationally-sensitive, hedonically-positive states. Is David Pearce puts it, there’s sort of an open question about, I think, whether or not minds in principle can be created to function on informationally-sensitive gradients of bliss. If that ends up being false, and that anger and suffering end up providing some really fundamental functional and epistemic place in minds in general, then I think that that’s just a hugely fundamental problem about the future and the kinds of minds that we should or should not create.

Kaj: Yeah, definitely. Of course, if we are talking about avoiding outcomes with extreme suffering, perhaps you might have scenarios where it is unavoidable to have some limited amount of suffering, but you could still create minds that were predominantly happy, and maybe they got angry and upset at times, but that would be a relatively limited amount of suffering that they experienced. You can definitely already see that there are some people alive who just seem to be constantly happy, and don’t seem to suffer very much at all. But of course, there is also the factor that if you are running on so-called negative emotions, and you do have anger and that kind of thing, then you are, again, probably more likely to react to situations in ways which might cause more suffering in others, as well as yourself. If we could create the kinds of minds that only had a limited amount of suffering from negative emotions, then you could [inaudible 00:49:27] that they happened to experience a bit of anger and lash out at others probably still wouldn’t be very bad, since other minds still would only experience the limited amount of suffering.

Of course, this gets to various philosophy of mind questions, as you mentioned. Personally, I tend to lean towards the views that it is possible to disentangle pain and suffering from each other. For instance, various Buddhist meditative practices are actually making people capable of experiencing pain without experiencing suffering. You might also have theories of mind which hold that the sort of higher-level theories of suffering are maybe too parochial. Like, Brian Tomasik has this view that maybe just anything that is some kind of negative feedback constitutes some level of suffering. Then it might be impossible to have systems which experienced any kind of negative feedback without also experiencing suffering. I’m personally more optimistic about that, but I do not know if I have any good, philosophically-rigorous reasons for being more optimistic, other than, well, that seems intuitively more plausible to me.

Lucas: Just to jump in here, just to add a point of clarification. It might seem sort of confusing how one might be experiencing pain without suffering.

Kaj: Right.

Lucas: Do you want to go ahead and unpack, then, the Buddhist concept of dukkha, and what pain without suffering really means, and how this might offer an existence proof for the nature of what is possible in minds?

Kaj: Maybe instead of looking at the Buddhist theories, which I expect some of the listeners to be somewhat skeptical about, it might be more useful to look at the term from medicine, pain asymbolia, also called pain dissociation. This is a known state which sometimes result from things like injury to the brain or certain pain medication, where people who have pain asymbolia report that they still experience pain, recognize the sensation of pain, but they do not actually experience it as aversive or something that would cause them suffering.

One way that I have usually expressed this is that pain is an attention signal, and pain is something that brings some sort of specific experience into your consciousness so that you become aware of it, and suffering is when you do not actually want to be aware of that painful sensation. For instance, you might have some physical pain, and then you might prefer not to be aware of that physical pain. But then even if we look at people in relatively normal conditions who do not have this pain asymbolia, then we can see that even people in relatively normal conditions may sometimes find the pain more acceptable. For some people who are, for instance, doing physical exercise, the pain may actually feel welcome, and a sign that they are actually pushing themselves to their limit, and feel somewhat enjoyable rather than being something aversive.

Similarly for, for instance, emotional pain. Maybe the pain might be some, like, mental image of something that you have lost forcing itself into your consciousness and making you very aware of the fact that you have lost this, and then the suffering arises if you think that you do not want to be aware of this thing you have lost. You do not want to be aware of the fact that you have indeed lost it and you will never experience it again.

Lucas: I guess just to sort of summarize this before we move on, it seems that there is sort of the mind stream, and within the mind stream, there are contents of consciousness which arise, and they have varying hedonic valences. Suffering is really produced when one is completely identified and wrapped up in some feeling tone of negative or positive hedonic valence, and is either feeling aversion or clinging or grasping to this feeling tone which they are identified with. The mere act of knowing or seeing the feeling tone of positive or negative valence creates sort of a cessation of the clinging and aversion, which completely changes the character of the experience and takes away this suffering aspect, but the pain content is still there. And so I guess this just sort of probably enters fairly esoteric territory about what is potentially possible with minds, but it seems important for the deep future when considering what is in principle possible of minds and superintelligence, and how that may or may not lead to suffering risks.

Kaj: What you described would be the sort of Buddhist version of this. I do tend to find that very plausible personally, both in light of some of my own experiences with meditative techniques, and clearly noticing that as a result of those kinds of practices, then on some days I might have the same amount of pain as I’ve had always before, but clearly the amount of suffering associated with that pain is considerably reduced, and also … well, I’m far from the only one who reports these kinds of experiences. This kind of model seems plausible to me, but of course, I cannot know it for certain.

Lucas: For sure. That makes sense. Putting aside the possibility of what is intrinsically possible for minds and the different hedonic valences within them and how they may or may not completely inter-tangled with the functionality of minds and the epistemics of minds, one of these possibilities which we’ve been discussing for superintelligence leading to suffering risks is that we fail in AI alignment. Failure in AI alignment may be due to governance, coordination, or political reasons. It might be caused by an arms race. It might be due to fundamental failures in meta-ethics or normative ethics. Or maybe even most likely it could simply be a technical failure in the inability for human beings to specify our values and to instantiate algorithms in AGI which are sufficiently well-placed to learn human values in a meaningful way and to evolve in a way that is appropriate and can engage new situations. Would you like to unpack and dive into dystopian scenarios created by non-value-aligned incentives in AI, and non-value-aligned AI in general?

Kaj: I already discussed these scenarios a bit before, suffering subroutines, mind-crime, and flawed realization of human values, but maybe one thing that would be worth discussing here a bit is that these kinds of outcomes might be created by a few different pathways. For instance, one kind of pathway is some sort of anthropocentrism. If we have a superintelligence that had been programmed to only care about humans or about minds which were sufficiently human-like by some criteria, then it might be indifferent to the suffering of other minds, including whatever subroutines or sub-minds it created. Or it might be, for instance, indifferent to the suffering experienced by, say, wild animal life in evolutionary simulations it created. Similarly, there is the possibility of indifference in general if we create a superintelligence which is just indifferent to human values, including indifference to reducing or avoiding suffering. Then it might create large numbers of suffering subroutines, it might create large amounts of simulations with sentient minds, and there is also the possibility of extortion.

Assuming the the superintelligence is not actually the only agent or superintelligence in the world … Maybe either there were several AI projects on earth that gained superintelligence roughly at the same time, or maybe the superintelligence expands into space and eventually encounters another superintelligence. In these kinds of scenarios, if one of the superintelligences cares about suffering but the other one does not, or at least does not care about this as much, then the superintelligence which cared less about suffering might intentionally create mind-crime and instate large numbers of suffering sentient beings in order to intentionally extort the other superintelligence into doing whatever it wants.

One more possibility is libertarianism regarding computation. If we have a superintelligence which has been programmed to just take every current living human being and give each human being some, say, control of an enormous amount of computational resources, and every human is allowed to do literally whatever they want with those resources, then we know that there exist a lot of people who are actively cruel and malicious, and many of those would use those resources to actually create suffering beings that they could torture for their own fun and entertainment.

Finally, if we are looking at these flawed realization kind of scenarios, where a superintelligence is partially value-aligned, but there might be something like, depending on the details of how exactly it is learning human values, and if it is doing some sort of extrapolation from those values, then we know that there have been times in history when circumstances that cause suffering have been defended by appealing to values that currently seem pointless to us, but which were nonetheless a part of the prevailing values at the time. If some value-loading process gave disproportionate weight to historical existing, or incorrectly, extrapolated future values, which endorsed or celebrated cruelty or outright glorified suffering, then we might get a superintelligence which had some sort of creation of suffering actually as an active value in whatever value function it was trying to optimize for.

Lucas: In terms of extortion, I guess just kind of a speculative idea comes to mind. Is there a possibility of a superintelligence acausally extorting other superintelligences if it doesn’t care about suffering and expects that to be a possible value, and for there to be other superintelligences nearby?

Kaj: Acausal stuff is the kind of stuff that I’m sufficiently confused about that I don’t actually want to say anything about that.

Lucas: That’s completely fair. I’m super confused about it too. We’ve covered a lot of ground here. We’ve established what s-risks are, we’ve established a taxonomy for them, we’ve discussed their probability, their scope. Now, a lot of this probably seems very esoteric and speculative to many of our listeners, so I guess just here in the end I’d like to really drive home how and whether to work on suffering risks. Why is this something that we should be working on now? How do we go about working on it? Why isn’t this something that is just so completely esoteric and speculative that it should just be ignored?

Kaj: Let’s start by looking at how we could working on avoiding suffering risks, and then when we have some kind of an idea of what the possible ways of doing that are, then that helps us say whether we should be doing those things. One thing that is a sort of a nicely joint interest of both reducing risks of extinction and also reducing risks of astronomical suffering is the kind of general AI value alignment work that is currently being done, classically, by the Machine Intelligence Research Institute and a number of other places. As I’ve been discussing here, there are ways by which an unaligned AI or one which was partially aligned could cause various suffering outcomes. If we are working on the possibility of actually creating value-aligned AI, then that should ideally also reduce the risk of suffering risks being realized.

In addition to technical work, there are also some societal work, social and political recommendations, which are similar both from the viewpoint of extinction risks and suffering risks. For instance, Nick Bostrom has noted that if we had some sort of conditions of what he calls global turbulence of cooperation and such things breaking down during some crisis, then that could create challenges for creating value-aligned AI. There are things like arms races and so on. If we consider that the avoidance of suffering outcomes is the joint interest of many different value systems, then measures that improve the ability of different value systems to cooperate and shape the world in their desired direction can also help avoid suffering outcomes.

Those were a few things that are sort of the same as with so-called classical AI risk work, but there is also some stuff that might be useful for avoiding negative outcomes in particular. There is the possibility that if we are trying to create an AI which gets all of humanity’s values exactly right, then that might be a harder goal than simply creating an AI which attempted to avoid the most terrible and catastrophic outcomes.

You might have things like fail-safe methods, where the idea of the fail-safe methods would be that if AI control fails, the outcome will be as good as it gets under the circumstances. This could be giving the AI the objective of buying more time to more carefully solve goal alignment. Or there could be something like fallback goal functions, where an AI might have some sort of fallback goal that would be a simpler or less ambitious goal that kicks in if things seem to be going badly under some criteria, and which is less likely to result in bad outcomes. Of course, here we have difficulties in selecting what the actual safety criteria would be and making sure that the fallback goal gets triggered under the correct circumstances.

Eliezer Yudkowsky has proposed building potential superintelligences in such a way as to make them widely separated in design space from ones that would cause suffering outcomes. For example, one thing he discussed was that if an AI has some explicit representation of what humans value which it is trying to maximize, then it could only take a small and perhaps accidental change to turn that AI into one that instead maximized the negative of that value and possibly caused enormous suffering that way. One proposal would be to design AIs in such a way that they never explicitly represent complete human values so that the AI never contains enough information to compute the kinds of states of the universe that we would consider worse than death, so you couldn’t just flip the sign of the utility function and then end up in a scenario that we would consider worse than death. That kind of a solution would also reduce the risk of suffering being created through another actor that was trying to extort a superintelligence.

Looking more generally at things and suffering risks, we actually already discussed here, there are lots of open questions in philosophy of mind and cognitive science which, if we could answer them, could inform the question of how to avoid suffering risks. If it turns out that you can do something like David Pearce’s idea of minds being motivated purely by gradients of wellbeing and not needing to suffer at all, then that might be a great idea, and if we could just come up with such agents and ensure that all of our descendants that go out to colonize the universe are ones that aren’t actually capable of experiencing suffering at all, then that would seem to solve a large class of suffering risks.

Of course, this kind of thing could also have more near-term immediate value, like if we figure out how to get human brains into such states where they do not experience much suffering at all, well, obviously that would be hugely valuable already. There might be some interesting research in, for instance, looking even more at all the Buddhist theories and the kinds of cognitive changes that various Buddhist contemplative practices produce in people’s brains, and see if we could get any clues from that direction.

Given that these were some ways that we could reduce suffering risks and their probability, then there was the question of whether we should do this. Well, if we look at the initial criteria of when a risk is worth working on, a risk is worth working on if the adverse outcome would be severe and if the risk has some reasonable probability of actually being realized, and it seems like we can come up with interventions that plausible effect either the severity or the probability of a realized outcome. Then a lot of times things seem like they could very plausible either influence these variables or at least help us learn more about whether it is possible to influence those variables.

Especially given that a lot of this work overlaps with the kind of AI alignment research that we would probably want to do anyway for the sake of avoiding extinction, or it overlaps with the kind of work that would regardless be immensely valuable in making currently-existing humans suffer less, in addition to the benefits that these interventions would have on suffering risks themselves, it seems to me like we have a pretty strong case for working on these things.

Lucas: Awesome, yeah. Suffering risks are seemingly neglected in the world. They are tremendous in scope, and they are of comparable probability of existential risks. It seems like there’s a lot that we can do here today, even if at first the whole project might seem so far in the future or so esoteric or so speculative that there’s nothing that we can do today, whereas really there is.

Kaj: Yeah, exactly.

Lucas: One dimension here that I guess I just want to finish up on that is potentially still a little bit of an open question for me is, in terms of really nailing down the likelihood of suffering risks in, I guess, probability space, especially relative to the space of existential risks. What does the space of suffering risks look like relative to that? Because it seems very clear to me, and perhaps most listeners, that this is clearly tremendous in scale, that it relies on some assumptions about intelligence, philosophy of mind, consciousness and other things which seem to be reasonable assumptions, to sort of get suffering risks off the ground. Given some reasonable assumptions, it seems that there’s a clearly large risk. I guess just if we could unpack a little bit more about the probability of them relative to suffering risks. Is it possible to more formally characterize the causes and conditions which lead to x-risks, and then the causes and conditions which lead to suffering risks, and how big these spaces are relative to one another and how easy it is for certain sets of causes and conditions respective to each of the risks to become manifest?

Kaj: That is an excellent question. I am not aware of anyone having done such an analysis for either suffering risks or extinction risks, although there is some work on specific kinds of extinction risks. Seth Baum has been doing some nice fault tree analysis of things that might … for instance, the probability of nuclear war and the probability of unaligned AI causing some catastrophe.

Lucas: Open questions. I guess just coming away from this conversation, it seems like the essential open questions which we need more people working on and thinking about are the ways in which meta-ethics and normative ethics and disagreements there change the way we optimize the application of resources to either existential risks versus suffering risks, and the kinds of futures which we’d be okay with, and then also sort of pinning down more concretely the specific probability of suffering risks relative to existential risks. Because I mean, in EA and the rationality community, everyone’s about maximizing expected value or utility, and it seems to be a value system that people are very set on. And so the probability here, small changes in the probability of suffering risks versus existential risks, probably leads to vastly different, less or more, amounts of value in a variety of different value systems. Then there are tons of questions about what is in principle possible of minds and the kinds of minds that we’ll create. Definitely a super interesting field that is really emerging.

Thank you so much for all this foundational work that you and others like your coauthor, Lukas Gloor, have been doing on this paper and the suffering risk field. Is there any other things you’d like to touch on? Any questions or specific things that you feel haven’t been sufficiently addressed?

Kaj: I think we have covered everything important. I will probably think of something that I will regret not mentioning five minutes afterwards, but yeah.

Lucas: Yeah, yeah. As always. Where can we check you out? Where can we check out the Foundational Research Institute? How do we follow you guys and stay up to date?

Kaj: Well, if you just Google the Foundational Research Institute or go to foundational-research.org, that’s our website. We, like everyone else, also post stuff on a Facebook page, and we have a blog for posting updates. Also, if people want a million different links just about everything conceivable, they will probably get that if they follow my personal Facebook, page, where I do post a lot of stuff in general.

Lucas: Awesome. Yeah, and I’m sure there’s tons of stuff, if people want to follow up on this subject, to find on your guys’s site, as you guys are primarily the people who are working and thinking on this sorts of stuff. Yeah, thank you so much for your time. It’s really been a wonderful conversation.

Kaj: Thank you. Glad to be talking about this.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

[end of recorded material]

Podcast: Nuclear Dilemmas, From North Korea to Iran

With the U.S. pulling out of the Iran deal and canceling (and potentially un-canceling) the summit with North Korea, nuclear weapons have been front and center in the news this month. But will these disagreements lead to a world with even more nuclear weapons? And how did the recent nuclear situations with North Korea and Iran get so tense? (Update: The North Korea summit happened! But to understand what the future might look like with North Korea and Iran, it’s still helpful to understand the past.)

To learn more about the geopolitical issues surrounding North Korea’s and Iran’s nuclear situations, as well as to learn how nuclear programs in these countries are monitored, Ariel spoke with Melissa Hanham and Dave Schmerler on this month’s podcast. Melissa and Dave are both nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies, where they research weapons of mass destruction with a focus on North Korea. Topics discussed in this episode include:

  • the progression of North Korea’s quest for nukes,
  • what happened and what’s next regarding the Iran deal,
  • how to use open-source data to monitor nuclear weapons testing, and
  • how younger generations can tackle nuclear risk.

In light of the on-again/off-again situation regarding the North Korea Summit, Melissa sent us a quote after the podcast was recorded, saying:

“Regardless of whether the summit in Singapore takes place, we all need to set expectations appropriately for disarmament. North Korea is not agreeing to give up nuclear weapons anytime soon. They are interested in a phased approach that will take more than a decade, multiple parties, new legal instruments, and new technical verification tools.”

Links you might be interested in after listening to the podcast:

You can listen to the podcast above or read the transcript below.

 

Ariel: Hello. I am Ariel Conn with the Future of Life Institute. This last month has been a rather big month concerning nuclear weapons, with the US pulling out of the Iran deal and the on again off again summit with North Korea.

I have personally been doing my best to keep up with the news but I wanted to learn more about what’s actually going on with these countries, some of the history behind the nuclear weapons issues related to these countries, and just how big a risk nuclear programs in these countries could become.

Today I have with me Melissa Hanham and Dave Schmerler, who are nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies. They both research weapons of mass destruction with a focus on North Korea. Melissa and Dave, thank you so much for joining us today.

Dave: Thanks for having us on.

Melissa: Yeah, thanks for having us.

Ariel: I just said that you guys are both experts in North Korea, so naturally what I want to do is start with Iran. That has been the bigger news story of the two countries this month because the US did just pull out of the Iran deal. Before we get any further, can you just, if it’s possible, briefly explain what was the Iran deal first? Then we’ll get into other questions about it.

Melissa: Sure. The Iran deal was an agreement made between the … It’s formally known as the JCPOA and it was an agreement made between Iran and several countries around the world including the European Union as well. The goal was to freeze Iran’s nuclear program before they achieved nuclear weapons while still allowing them civilian access to medical isotopes, and power, and so on.

At the same time, the agreement would be that the US and others would roll back sanctions on Iran. The way that they verified that agreement was through a procurement channel, if-needed onsite inspections, and regular reporting from Iran. As you mentioned, the US has withdrawn from the Iran deal, which is really just, they have violated the terms of the Iran deal, and Iran and European Union and others have said that they wish to continue in the JCPOA.

Ariel: If I’ve been reading correctly, the argument on the US side is that Iran wasn’t holding up their side of the bargain. Was there actually any evidence for that?

Dave: I think the American side for pulling out was more based on them lying about having a nuclear weapons program at one point in time, leading up to the deal, which is strange, because that was the motivation for the deal in the first place, was to stop them from continuing their nuclear weapons, their research and investment. So, I’m not quite sure how else to frame it outside of that.

Melissa: Yeah, Israeli President Netanyahu, made this presentation where he revealed all these different archived documents in Iran, and mostly what they indicated was that Iran had an ongoing nuclear weapons program before the JCPOA, which is what we knew, and that they were planning on executing that program. For people like me, I felt like that was the justification for the JCPOA in the first place.

Ariel: And so, you both deal a lot with, at least Melissa I know you deal a lot with monitoring. Dave, I believe you do, too. With something like the Iran deal, if we had continued with it, what is the process involved in making sure the weapons aren’t being created? How do we monitor that?

Melissa: It’s a really difficult multilayered technical and legal proposition. You have to get the parties involved to agree to the terms, and then you have to be able to technically and logistically implement the terms. In the Iran deal, there were some things that were included and some things that were not included. Not because it was not technically possible, but because Iran or the other parties would not agree to it.

It’s kind of a strange marriage between diplomacy and technology, in order to execute these agreements. One of the criticisms of the Iran deal was that missiles weren’t included, so sure enough, Dave was monitoring many, many missile launches, and our colleague, Shea Cotton, even made a database of North Korean missile launches, and Americans really hated that Iran was launching these missiles, and we could see that they were happening. But the bottom line was that they were not part of the JCPOA agreement. That agreement focused only on nuclear, and the reason it did was because Iran refused to include missiles or human rights and these other kinds of things.

Dave: That’s right. Negotiating Iran’s missile program is a bit of another issue entirely. Iran’s missile program began before their nuclear program did. It’s accelerated, development has corresponded to their own security concerns within the region, and they have at the moment, a conventional ballistic missile force. The Iranians look at that program as being a completely different issue.

Ariel: Just quickly, how do you monitor a missile test? What’s involved in that? What do you look for? How can you tell they’re happening? Is it really obvious, or is there some sort of secret data you access?

Dave: A lot of the work that we do — Melissa and I, Shea Cotton, Jeffrey Lewis, and some other colleagues — is entirely based on information from the public. It’s all open source research, so if you know what you’re looking for, you can pull all the same information that we do from various sources of free information. The Iranians will often put propaganda or promo videos of their missile tests and launches as a way to demonstrate that they’re becoming a more sophisticated, technologically modern, ballistic missile producing nation.

We also get reports from the US government that are published in news sources. Whether from the US government themselves, or from reporters who have connections or access to the inside, and we take all this information, and Melissa will probably speak to this a bit further, but we fuse it together with satellite imagery of known missile test locations. We’ll reconstruct a much larger, more detailed chain of events as to what happened when Iran does missile testing.

Melissa: I have to admit, there’s just more open source information available about missile tests, because they’re so spread out over large areas and they have very large physical attributes to the sites, and of course, something lights up and ignites, and it takes off into the air where everyone can see it. So, monitoring a missile launch is easier than monitoring a specific facility in a larger network of facilities, for a nuclear program.

Ariel: So now that Trump has pulled out of the Iran deal, what happens next with them?

Melissa: Well, I think it’s probably a pretty bad sign. What I’ve heard from colleagues who work in or around the Trump administration is that confidence was extremely high on progress with North Korea, and so they felt that they didn’t need the Iran deal anymore. And in part, the reason that they violated it was because they felt that they had so much already going in North Korea, and those hopes were really false. There was a huge gap between reality and those hopes. It can be frustrating as an open source analyst who says these things all the time on Twitter, or in reports, that clearly nobody reads them. But no, things are not going well in North Korea. North Korea is not unilaterally giving over their nuclear weapons, and if anything, violating the Iran deal has made North Korea more suspicious of the US.

Ariel: I’m going to use that to transition to North Korea here in just a minute, but I guess I hadn’t realized that there was a connection between things seeming to go well in North Korea and the US pulling out of the Iran deal. You talk about hopes that the Iran deal is now necessary for North Korea, but what is the connection there? How does that work?

Melissa: Well, so the Iran deal represented diplomatic negotiation with an outcome among many parties that came to a concrete result. It happened under the Obama administration, which I think is why there is some distaste for it under the Trump administration. That doesn’t matter to North Korea. That doesn’t matter to other states. What matters is whether the United States appears to be able to follow through on a promise that may pass one administration to another.

The US has in a way, violated some norms about diplomatic behavior, by withdrawing from this agreement. That’s not to say that the US hasn’t done it before. I remember Clinton signing the, I think Rome Treaty, for the International Criminal Accord, then Bush unsigning it, it never got ratified. But it’s bad for our reputation. It makes us look like we’re not using international law the way other countries expect us to.

Ariel: All right. So before we move officially to North Korea, is there anything else, Melissa and Dave, that either of you want to mention about Iran that you think is either important for people to know about, that they don’t already, or that is important to reiterate?

Melissa: No. I guess let’s go to North Korea. That’s our bread and butter.

Ariel: All right. Okay, so yeah, North Korea’s been in the news for a while now. Before we get to what’s going on right now, I was hoping you could both talk a little bit about some of the background with North Korea, and how we got to this point. North Korea was once part of the Non-Proliferation Treaty, and they pulled out. Why were they in it in the first place? What prompted them to pull out? We’ll go from there.

Melissa: Okay, I’ll jump in, although Dave should really tell me if I keep talking over him. North Korea withdrew from the NPT, or so it said. It’s actually diplomatically very complex what they did, but North Korea either was or is a member of the Nuclear Non-Proliferation Treaty, the NPT, depending on who you ask. That is in large part because they were, and then they announced their withdrawal in 2003, and eventually we no longer think of them as officially being a member of the NPT, but of course, there were some small gaps over the notification period that they gave in order to withdraw, so I think my understanding is that some of the organizations involved actually keep a little North Korean nameplate for them.

But no, we don’t really think of them as being a member of an NPT, or IAEA. Sadly, while that may not be a legally settled, they’re out, they’re not abiding by traditional regimes or norms on this issue.

Ariel: And can you talk a little bit about, or do we know what prompted them to withdraw?

Melissa: Yeah. I think they really, really wanted nuclear weapons. I mean, I’m sorry to be glib about it, but … Yeah, they were seeking nuclear weapons since the ’50s. Kim Il-sung said he wanted nuclear weapons, he saw the power of the US’ weapons that were dropped on Japan. The US threatened North Korea during the Korean War with use of nuclear weapons, so yeah, they had physicists working on this issue for a long time.

They joined the NPT, they wanted access to the peaceful uses of nuclear power, they were very duplicitous in their work, but no, they kept working towards nuclear weapons. I think they reached a point where they probably thought that they had the technical capability, and they were dissatisfied with the norms and status as a pariah state, so yeah, they announced they were withdrawing, and then they exploded something three years later.

Ariel: Now that they’ve had a program in place then I guess for, what? Roughly 15 years then?

Melissa: Oh, my gosh. Math. Yeah. No, so I was sitting in Seoul. Dave, do you remember where you were when they had their first nuclear test?

Dave: This was-

Melissa: 2006.

Dave: A long time ago. I think I was still in high school.

Melissa: I mean, this is a challenge to our whole field, right? Is that there are generations passing through, so there are people who remember 1945. I don’t. But I’m not going to reveal my age. I was fresh out of grad school, and working in Seoul when North Korea tested its first nuclear device.

It was like cognitive dissonance around the world. I remember the just shock of the response out of pretty much every country. I think China had a few minutes notice ahead of everybody else, but not much. So yes, we did see the reactor getting built, yes, we did see activity happening at Yongbyon, no we deeply misunderstood and underestimated North Korea’s capabilities.

So, when that explosion happened, it was surprising, to people in the open source anyways. People scrambled. I mean, that was my first major gig. That’s why I still do this today, was we had an office at the International Crisis Group, of about six people, and all our Korean speakers were immediately sucked into other responsibilities, and so it was up to me to try to take out all these little puzzle pieces, about the seismic information, about the radionuclides that were actually leaked in that first explosion, and figure out what a Constant Phoenix was, and who was collecting what, and put it all together to try to understand what kind of warhead that they may or may not have exploded, if it was even a warhead at that point.

Ariel: I’m hoping that you can explain how monitoring works. I’m an ex-seismologist, so I actually do know a little bit about the seismic side of monitoring nuclear weapons testing, but I’m assuming a lot of listeners do not. I’m not as familiar with things like the radionuclide testing, or the Phoenix that you mentioned was a new phrase for me as well. I was hoping you could explain what you go through to monitor and confirm whether or not a nuclear weapon has been tested, and before you do that real quick — so did you actually see that first … Could you see the explosion?

Melissa: No. I was in Seoul, so I was a long ways away, and I didn’t really … Of course, I did not see or feel anything. I was in an office in downtown Seoul, so I remember actually how casual the citizens of Seoul were that day. I remember feeling kind of nervous about the whole thing. I was registered with the Canadian embassy in Seoul, and we actually had, when you registered with the embassy, we had instructions of what to do in case of an emergency.

I remember thinking, “Gosh, I wonder if this is an emergency,” because I was young and fresh out of school. But no, I mean, as I looked down out of our office windows, sure enough at noon, the doors opened up and all my Korean colleagues streamed out to lunch together, and really behaved pretty traditionally, the way everyone normally does.

South Koreans have always been very stoic about these tests, and I think they’re taken more anxiously by foreigners like me. But I do also remember there were these aerial sirens going off that day, and I actually never got an explanation of why there were sirens going off that day. I remember they tested them when I lived there, but I’m not sure why the sirens were going off that day.

Ariel: Okay. Let’s go back to how the monitoring works, and Dave, I don’t know if this is something that you can also jump in on?

Dave: Yeah, sure. I think I’ll let Melissa start and I’ll try to fill in any gaps, if there are any.

Melissa: So, the Comprehensive Test Ban Treaty Organization is an organization based in Vienna, but they have stations all over the world, and they’re continually monitoring for nuclear explosions. The Constant Phoenix is a WC-135. It’s a US Air Force vehicle, and so the information coming out of it is not open source and I don’t get to see it, but what I can do, or what journalists, investigative journalists sometimes do, is, say, when it’s taking off from Guam, or an Air Force Base, and then I know at least that the US Air Force is thinking it’s going to be sensing something, so this is like a specialty vehicle. I mean, it’s basically an airplane, but it has many, many interesting sensor arrays all over it that sniff the air. What they’re trying to detect are xenon isotopes, and these are isotopes that are possibly released from an underground nuclear test, depending on how well the tunnel was sealed.

In that very first nuclear explosion in 2006, some noble gases were released and I think that they were detected by the WC-135. I also remember back then, although this was a long time ago, that there were a few sensing stations in South Korea that detected them as well. What I remember from that time is that the ratio of xenon isotopes was definitely telling us that this was a nuclear weapon. This wasn’t like a big hoax that they’d exploded a bunch of dynamite or something like that, which actually would be a really big hoax, and hard to pull off. But we could see that it was a nuclear test, it was probably a fission device. The challenge with detecting these gases is that they decay very quickly, so we have, 1) not always sensed radionuclides after North Korea’s nuclear tests, and, 2) if we do sense them, sometimes they’re decayed enough that we can’t get anything more than it was a nuclear test, and not a chemical explosion test.

Dave: Yeah, so I might be able to offer, because Melissa did a great job of explaining how the process works, is maybe a bit more of a recent mechanism and how we interact with these tests as they occur. Usually most of the people in our field follow a set number of seismic-linked Twitter accounts that will give you updates on when some part of the world is shaking for some reason or another.

They’ll put a tweet or maybe you’ll get an email update saying, “There was an earthquake in California,” because we get earthquakes all the time, or in Japan. Then, all of a sudden you hear there’s an earthquake in North Korea and everyone pauses. You look at this little tweet, I guess, or email, you can also get them sent to your phone via text message, if you sign up for whichever region of the world you’re interested in, and you look for what province was this earthquake in?

If it registers in the right province, you’re like, “Okay.” What’s next is we’ll look at the data that comes out immediately. CTBTO will come out with information, usually within a couple of days, if not immediately after, and we’ll look at the seismic waves. While I don’t study these waves, the type of seismic signature you get from a nuclear explosion is like a fingerprint. It’s very unique and different from the type of seismic signature you get from an earthquake of varying degrees.

We’ll take that and compare those to previous tests, which the United States and Russia have done infinitely more than any other country in the world. And we’ll see if those match. And as North Korea has tested more nuclear devices, the signatures started coming more consistent. If that matches up, we’ll have a soft confirmation that they did it, and then we’ll wait for government news, press releases to give us the final nail confirming that there was a nuclear test.

Melissa: Yeah, so as Dave said, as a citizen scientist, I love just setting up the USGS alert, and then if there’s an earthquake near the village of Punggye-ri, I’m like, “Ah-hah, I got you” because it’s not a very seismically active area. When the earthquakes happen that are related to an underground nuclear test, they’re shallow. They’re not deep, geological events.

Yeah, there’s some giveaways like, people like to do them on the hour, or the half hour, and mother nature doesn’t care. But some resources for your listeners, if they want to get involved and see, is you can go to the USGS website and set up your own alert. The CTBTO has not just seismic stations, but the radionuclide stations I mentioned, as well as infrasound and hydroacoustic, and other types of facilities all over the world. There’s a really cool map on their website where they show the over… I think it’s nearly 300 stations all around the world now, that are devoted exclusively to monitoring nuclear tests.

They get their information out, I think in seven minutes, and I don’t get that information necessarily in the first seven minutes, because I’m not a state member, a state party. But they will give out information very soon afterwards, and actually based on the seismic data, our colleagues, Jeffrey Lewis and some other young, smart people of the world, actually threw together a map, not using CTBTO data, but using the seismic stations of I think Iran, China, Japan, South Korea, and so if you go to their website, it’s called SleuthingFromTheInternet.com, you can set up little alerts there too, or scale for all the activities that are happening.

That was really just intended I think to be a little bit transparent with the seismic data and try to see data from different country stations, and in part, it was conceived because I think the USGS was deleting some of their explosions from the database and someone noticed. So now the idea is that you take a little bit of data from all these different countries, and that you can compare it to each other.

The last place I would suggest is to go to the IRIS seismic monitoring station, because just as Dave was mentioning, each seismic event has a different P wave, and so it shows up differently, like a fingerprint. And so, when IRIS puts out information, you can very quickly see how the different explosions in North Korea compare to each other, relatively, and so that can be really useful, too.

Dave: I will say, though, that sometimes you might get a false alarm. I believe it was with the last nuclear test. There was one reporting station, their automatic alert system that was put up out of the UK, that didn’t report it. No one caught that it didn’t, and then it did report it like a week later. So, for all of half an hour until we figured it out, there was a bit of a pause because there was some concern they might have done another test again, which would have been the seventh, but it turned out just being a delayed reporting.

Dave: Most of the time these things work out really well, but you always have to look for secondary and third sources of confirmation when these types of events happen.

Ariel: So a quick aside, we will have links to everything that you both just brought up in the transcript, so anyone interested in following up with any of these options, will be able to. I’m also going to share a fun fact that I learned, and that was, we originally had a global seismic network in order to monitor nuclear weapons testing. That’s why it was set up. And it’s only because we set that up that we actually were able to prove the plate tectonics theory.

Melissa: Oh, cool.

Dave: That’s really cool.

Melissa: Yeah. No, the CTBTO is really interesting, because even though the treaty isn’t enforced yet, they have these amazing scientific resources, and they’ve done all kinds of things. Like, they can hear whales moving around with their hydroacoustic technology, and when Iran had an explosion, a major explosion at their solid motor missile facility, they detected that as well.

Ariel: Yeah. It’s fun. Like I said, I did seismology a while ago so I’m signed up for lots of fun alerts. It’s always fun to learn about where things are blowing up in the earth’s surface.

Melissa: Well, that’s really the magic of open source to me. I mean, it used to be that a government came out and said, “Okay, this is what happened, and this is what we’re going to do about it.” But the idea that me, like a regular person in the world, can actually look up this primary information in the moments that it happens, and make a determination for myself, is really empowering. It makes me feel like I have the agency I want to have in understanding the world, and so I have to admit, that day in South Korea, when I was sitting there in the office tower and it was like, “Okay, all hands on deck, everyone’s got to write a report” and I was trying to figure it out, I was like, “I can’t believe I’m doing this. I can’t believe I can do this.” It’s such a different world already.

Ariel: Yeah. That is really amazing. I like your description. It’s really empowering to know that we have access to this information. So, I do want to move on and with access to this information, what do we know about what’s going on in North Korea right now? What can you tell us about what their plans are? Do we think the summit will happen? I guess I haven’t kept up with whatever the most recent news is. Do we think that they will actually do anything to get rid of their nuclear weapons?

Dave: I think at this point, the North Koreans feel really comfortable with the amount of information and progress they’ve made in their nuclear weapons program. That’s why they’re willing to talk. This program was primarily as a means to create a security assurance for the North Koreans because the Americans and South Koreans and whatnot have always been interested in regime change, removing North Korea from the equation, trying to end the thing that started in the 1950s, the Korean War, right? So there’d just be one Korea, we wouldn’t have to worry about North Korea, or this mysterious Hermit Kingdom, above the 38th parallel.

With that said, there’s been a lot of speculation as to why the North Koreans are willing to talk to us now. Some people have been floating around the idea that maximum pressure, I think that was the word used, with sanctions and whatnot, has brought the North Koreans to their knees, and now they’re willing to give up their nukes, as we’ve been hearing about.

But the way the North Koreans use denuclearization is very important. Because on one hand, that could mean that they’re willing to give up their nuclear weapons, and to denuclearize the state itself, but the way the North Koreans use it is much broader. It’s more used in the way of denuclearizing the peninsula. It’s not specifically reflective onto them.

Now that they’ve finally achieved some type of reasonable success with their nuclear weapons program, they’re more in a position where they think they can talk to the United States as equals, and denuclearization falls into the terminology that it’s used by other nuclear weapons states, where it’s a, “In a better world we won’t need these types of horrible weapons, but we don’t live in that world today, so we will stand behind the effort to denuclearize, but not right now.”

Melissa: Yeah, I think we can say that if we look at North Korea’s capabilities first, and then why they’re talking now, we can see that in the time when Dave and I were cutting our teeth, they were really ramping up their nuclear and missile capabilities. It wasn’t immediately obvious, because a lot of what was happening was inside a laboratory or inside a building, but then eventually they started doing nuclear tests and then they did more and more missile tests.

It used to be that a missile test was just a short range missile off the coast, sometimes it was a political grandstanding. But if you look, our colleague, Shea Cotton, made a missile database that shows every North Korean missile test, and you can see that in the time under Kim Jong-un, those tests really started to ramp up. I think Dave, you started at CNS in like 2014?

Dave: Right around then.

Melissa: Right around then, so they jumped up to like 19 missile tests that year. I can say this because I’m looking at the database right now, and they started doing really more interesting things than ever before, too. Even though diplomatically and politically we were still thinking of them as being backwards, as not having a very good capability, if we looked at it quantitatively, we could say, “Well, they’re really working on something.”

So Dave actually was really excellent at geolocating. When they did engine tests, we could measure the bell of the engine and get a sense of what those engines were about. We could see solid fuel motors being tested, and so this went all the way up until ICBM launched last fall, and then they were satisfied.

Ariel: So when you say engine testing, what does that mean? What engine?

Dave: The North Korean ballistic missile fleet used to be entirely tied to this really old Soviet missile called the Scud. If anyone’s played video games in the late ’90s or early 2000s, that was the small missile that you always had to take out or something along that line, and it was fairly primitive. It was a design that the North Koreans hadn’t demonstrated they were able to move beyond, that’s why then the last three years started to kick in, the North Koreans started to field more complicated missiles instead of showing that they were doing engine tests with more experimental, more advanced designs that we had seen in other parts of the world previously. Some people were a bit speculative or doubting that the North Koreans were actually making serious progress. Then last year, they tested their first intermediate range ballistic missile which can hit Guam, which is something that they’ve been trying to do for a while, but it hadn’t worked out. Then, they made that missile larger, they made their first ICBM.

Then they made that missile even larger, came up with a much more ambitious engine design using two engines instead of one. They had a much more advanced steering system, and they came up with the Hwasong-15 which is their longest range ICBM. It’s a huge shift from the way we were having this conversation 5 to 10 years ago, where we were looking at their space launch vehicles, which were, again, modified Scuds that were stretched out and essentially tied together, to an actual functioning ICBM fleet.

The technological shift in pair with their nuclear weapons developments have really demonstrated that the North Koreans are no longer this 10 to 20 year, around the corner threat, that they actually possess the ability to launch nuclear weapons at the United States.

Melissa: And back when they had their first nuclear test in 2006, people were like, “It’s a device.” I think for years, we still call it a device. But back then, the US and others kept moving the goalposts. They were saying, “Well, all right. They had a nuclear device explode. We don’t know how big it was, they have no way of delivering it. We don’t know what the yield was. It probably fizzled.” It was dismissive.

So, from that period, 2006 to today, it’s a real remarkable challenge. Almost every criticism that North Korea has faced, right down to their heat shield on their ICBM, has been addressed vociferously with propaganda, photos and videos that we in turn can analyze. And yeah, I think they have demonstrated essentially that they can explode something, they can launch a missile that can carry something that can explode.

The only thing they haven’t done, and Dave can chime in here, is explode a nuclear weapon on the tip of a missile. Other countries have done this, and it’s terrifying, and because Dave is such a geographically visual person, I’ll let him describe what that might look like. But if we keep goading them, if we keep telling them they’re backwards, eventually they’re going to want to prove it.

Dave: Yeah, so off of Melissa’s point, this is something that I believe Jeffrey might have coined. It’s called the Juche Bird, which is a playoff of Frigate Bird, which was a live nuclear warhead test that the Americans conducted. The North Koreans, in order to prove that the system in its entirety — the nuclear device, the missile, the reentry shield — all work and it’s not just small random successes in different parts of a much larger program, is they would take a live nuclear weapon, put it on the end of a long range missile, launch it in the air, and detonate it at a specific location to show that they have the ability to actually use the purported weapon system.

Melissa: So if you’re sitting in Japan or South Korea, but especially Japan, and you imagine North Korea launching an intermediate range or intercontinental ballistic missile over your country, with a nuclear weapon on it, in order to execute an atmospheric test, that makes you extremely nervous. Extremely nervous, and we all should be a little bit nervous, because it’s really hard for anyone in the open source, and I would argue in the intelligence community, to know, “Well, this is just an atmospheric test. This isn’t the beginning of a war.”

We would have to trust that they pick up the trajectory of that missile really fast and determine that it’s not heading anywhere. That’s the challenge with all of these missile tests, is no one can tell if there’s a warhead on it, or not a warhead on it, and then we start playing games with ballistic missile defense, and that is a whole new can of worms.

Ariel: What do you guys think is the risk that North Korea or any other country for that matter, would intentionally launch a nuclear weapon at another country?

Melissa: For me, it’s accidents, and an accident can unfold a couple of different ways. One way would be perhaps the US is performing joint exercises. North Korea has some sensing equipment up on peaks of mountains, and Dave has found every single one probably, but it’s not perfect. It’s not great, and if the picture comes back to them, it’s a little fuzzy, maybe this is no longer a joint exercise. This is the beginning of an attack. They will decide to engage.

They’ve long said that they believe that a war will start based on the pretext of a joint exercise. In reverse scenario, what if North Korea does launch an ICBM with a nuclear warhead, in order to perform a test, and the US or Japan or South Korea think, “Well, this is it. This is the war.” And so it’s those accidental scenarios that I worry about, or even perhaps what happens if a test goes badly? Or, someone is harmed in some way?

I worry that these states would have a hard time politically rolling back where they feel they have to be, based on these high stakes.

Dave: I agree with Melissa. I think the highest risk we have is also depending on our nuclear posture in accident. There have been accidents that have happened in the past where someone in a monitoring base picks up a bunch of bleeps on a radar, and people start initiating the game on protocol, and luckily we’ve been able to avoid that to its completion in the past.

Now, with the North Koreans, this could also work in their direction, as well. I can’t imagine that their sensing technology is up to par with what the United States has, or had, back when these accidents were a real thing and they happened. So if the North Koreans see a military exercise that they don’t feel comfortable with, or they have some type of technical glitch on their side, they might notionally launch something, and that would be the start of a conflict.

Ariel: One of the final questions that I have for both of you. I’ve read that while nuclear weapons are scary, the greater threat with North Korea could actually be their conventional weapons. Could either of you speak to that?

Dave: Yeah, sure. North Korea has a very large conventional army. Some people might try to make jokes about how modern that army is, but military force only needs to be so modern with the type of geographical game that’s in play on the Korean Peninsula. Seoul is really not that far from the DMZ, and it’s a widely known fact that North Korea has tons of artillery pointed at Seoul. They’ve had these things pointed there since the end of the Korean War, and they’re all entrenched.

You might be able to hit some of them, but you’re not going to hit all of them. This type of artillery, in connection with their conventional ballistic missile force, we’re talking about things that aren’t carrying a WMD, it’s a real big threat for some type of conventional action.

Seoul is a huge city. The metropolitan area at least has a population of over 20 million people. I’m not sure if you’ve ever been to Seoul, it’s a great, beautiful city, but traffic is horrible, and if everyone’s trying to leave the city when something happens, everyone north of the river is screwed, and congestion on the south side, it would just be a total disaster. Outside of the whole nuclear aspect of this dangerous relationship, the conventional forces North Korea has are equally as terrifying.

Melissa: I think Dave’s bang on, but the only thing I would add is that one of the things that’s concerning about having both nuclear and conventional forces is how you use your conventional forces with that extra nuclear guarantee. This is something that our boss, Jeffrey Lewis, has written about extensively. But do you use that extra measure of security and just preserve it, save it? Does Kim Jong-un go home at night to his family and say, “Yes, I feel extra safe today because I have my nuclear security?”

Or do you use that extra nuclear security in order to increase the number of provocations that you do conventionally? Because we’ve had theses crises break out over the sinking of the Cheonan naval vessel, or the shelling of Yeonpyeong, near the border. In both cases, South Koreans died, but the question is will North Korea feel emboldened by its nuclear security, and will it carry out more conventional provocations?

Ariel: Okay, and so for the last question that I want to ask, we’ve talked about all these things that could go wrong, and there’s really just never anything that positive about a nuclear weapons discussion, but I still want to end with is there anything that gives you hope about this situation?

Dave: That’s a tough question. I mean, on one side, we have a nuclear armed North Korea, and this is something that we knew was coming for quite some time. I think if anything, this is one thing that I know I have and I believe Melissa has been advocating as well, is conversation and dialogue between North [Korea] and all the other associated parties, including the United States, is a way to begin some type of line of communication, hopefully so that accidents don’t happen.

‘Cause North Korea’s not going to be giving up their nukes anytime soon. Even though the talks that you may be having aren’t going to be as productive as you would want them to be, I believe conversation is critical at this moment, because the other alternatives are pretty bad.

Melissa: I guess I’ll add on that we have Dave now, and I know it sounds like I’m teasing my colleague, but it’s true. Things are bad, things are bad, but we’re turning out generation after generation of young, brilliant, enthusiastic people. Before 2014, we didn’t have a Dave, and now we have a Dave, and Dave is making more Daves, and every year we’re matriculating students who care about this issue, who are finding new ways to engage with this issue, that are disrupting entrenched thinking on this issue.

Nuclear weapons are old. They are scary, they are the biggest explosion that humans have ever made, but they are physical and finite, and the technology is aging, and I do think with new creative, engaging ways, the next generation’s going to come along and they’re going to be able to address this issue with new hacks. These can be technical hacks, they can be along the side of verification and trust building. These can be diplomatic hacks.

The grassroots movements we see all around the world, that are taking place to ban nuclear weapons, those are largely motivated by young people. I’m on this bridge where I get to see… I remember the Berlin Wall coming down, I also get to see the students who don’t remember 9/11, and it’s a nice vantage point to be able to see how history’s changing, and while it feels very scary and dark in this moment, in this administration, we’ve been in dark administrations before. We’ve faced much more terrifying adversaries than North Korea, and I think it’s going to be generations ahead who are going to help crack this problem.

Ariel: Excellent. That was a really wonderful answer. Thank you. Well, thank you both so much for being here today. I’ve really enjoyed talking with you.

Melissa: Thanks for having us.

Dave: Yeah, thanks for having us on.

Ariel: For listeners, as I mentioned earlier, we will have links to anything we discussed on the podcast in the transcript of the podcast, which you can find from the homepage of FutureOfLife.org. So, thanks again for listening, like the podcast if you enjoyed it, subscribe to hear more, and we will be back again next month.

[end of recorded material]

 

Podcast: What Are the Odds of Nuclear War? A Conversation With Seth Baum and Robert de Neufville

What are the odds of a nuclear war happening this century? And how close have we been to nuclear war in the past? Few academics focus on the probability of nuclear war, but many leading voices like former US Secretary of Defense, William Perry, argue that the threat of nuclear conflict is growing.

On this month’s podcast, Ariel spoke with Seth Baum and Robert de Neufville from the Global Catastrophic Risk Institute (GCRI), who recently coauthored a report titled A Model for the Probability of Nuclear War. The report examines 60 historical incidents that could have escalated to nuclear war and presents a model for determining the odds are that we could have some type of nuclear war in the future.

Topics discussed in this episode include:

  • the most hair-raising nuclear close calls in history
  • whether we face a greater risk from accidental or intentional nuclear war
  • China’s secrecy vs the United States’ transparency about nuclear weapons
  • Robert’s first-hand experience with the false missile alert in Hawaii
  • and how researchers can help us understand nuclear war and craft better policy

Links you might be interested in after listening to the podcast:

You can listen to this podcast above or read the transcript below.

 

 

Ariel: Hello, I’m Ariel Conn with the Future of Life Institute. If you’ve been listening to our previous podcasts, welcome back. If this is new for you, also welcome, but in any case, please take a moment to follow us, like the podcast, and maybe even share the podcast.

Today, I am excited to present Seth Baum and Robert de Neufville with the Global Catastrophic Risk Institute (GCRI). Seth is the Executive Director and Robert is the Director of Communications, he is also a super forecaster, and they have recently written a report called A Model for the Probability of Nuclear War. This was a really interesting paper that looks at 60 historical incidents that could have escalated to nuclear war and it basically presents a model for how we can determine what the odds are that we could have some type of nuclear war in the future. So, Seth and Robert, thank you so much for joining us today.

Seth: Thanks for having me.

Robert: Thanks, Ariel.

Ariel: Okay, so before we get too far into this, I was hoping that one or both of you could just talk a little bit about what the paper is and what prompted you to do this research, and then we’ll go into more specifics about the paper itself.

Seth: Sure, I can talk about that a little bit. So the paper is a broad overview of the probability of nuclear war, and it has three main parts. One is a detailed background on how to think about the probability, explaining differences between the concept of probability versus the concept of frequency and related background in probability theory that’s relevant for thinking about nuclear war. Then there is a model that scans across a wide range, maybe the entire range, but at least a very wide range of scenarios that could end up in nuclear war. And then finally, is a data set of historical incidents that at least had some potential to lead to nuclear war, and those incidents are organized in terms of the scenarios that are in the model. The historical incidents give us at least some indication of how likely each of those scenario types are to be.

Ariel: Okay. At the very, very start of the paper, you guys say that nuclear war doesn’t get enough scholarly attention, and so I was wondering if you could explain why that’s the case and what role this type of risk analysis can play in nuclear weapons policy.

Seth: Sure, I can talk to that. The paper, I believe, specifically says that the probability of nuclear war does not get much scholarly attention. In fact, we put a fair bit of time into trying to find every previous study that we could, and there was really, really little that we were able to find, and maybe we missed a few things, but my guess is that this is just about all that’s out there and it’s really not very much at all. We can only speculate on why there has not been more research of this type, my best guess is that the people who have studied nuclear war — and there’s a much larger literature on other aspects of nuclear war — they just do not approach it from a risk perspective as we do, that they are inclined to think about nuclear war from other perspectives and focus on other aspects of it.

So the intersection of people who are both interested in studying nuclear war and tend to think in quantitative risk terms is a relatively small population of scholars, which is why there’s been so little research, is at least my best guess.

Robert: Yeah, it’s a really interesting question. I think that the tendency has been to think about it strategically, something we have control over, somebody makes a choice to push a button or not, and that makes sense from some perspective. I think there’s also a way in which we want to think about it as something unthinkable. There hasn’t been a nuclear detonation in a long time and we hope that there will never be another one, but I think that it’s important to think about it this way so that we can find the ways that we can mitigate the risk. I think that’s something that’s been neglected.

Seth: Just one quick clarification, there have been very recent nuclear detonations, but those have all been tests detonations, not detonations in conflict.

Robert: Fair enough. Right, not a use in anger.

Ariel: That actually brings up a question that I have. As you guys point out in the paper, we’ve had one nuclear war and that was World War II, so we essentially have one data point. How do you address probability with so little actual data?

Seth: I would say “carefully,” and this is why the paper itself is very cautious with respect to quantification. We don’t actually include any numbers for the probability of nuclear war in this paper.

The easy thing to do for calculating probabilities is when you have a large data set of that type of event. If you want to calculate the probability of dying in a car crash, for example, there’s lots of data on that because it’s something that happens with a fairly high frequency. Nuclear war, there’s just one data point and it was under circumstances that are very different from what we have right now, World War II. Maybe there would be another world war, but no two world wars are the same. So we have to, instead, look at all the different types of evidence that we can bring in to get some understanding for how nuclear war could occur, which includes evidence about the process of going from calm into periods of tension, or the thought of going to nuclear war all the way to the actual decision to initiate nuclear war. And then also look at a wider set of historical data, which is something we did in this paper, looking at incidents that did not end up as nuclear wars, but pushed at least a little bit in that direction, to see what we can learn about how likely it is for things to go in the direction of nuclear war, which tells us at least something about how likely it is to get there all the way.

Ariel: Robert, I wanted to turn to you on that note, you were the person who did a lot of work figuring out what these 60 historical events were. How did you choose them?

Robert: Well, I wouldn’t really say I chose them, I tried to just find every event that was there. There are a few things that we left out because we thought it falls below some threshold of the seriousness of the incident, but in theory you could probably expand it in the scope even a little wider than we did. But to some extent we just looked at what’s publicly known. I think the data set is really valuable, I hope it’s valuable, but one of the issues with it is it’s kind of a convenience sample of the things that we know about, and some areas, some parts of history, are much better reported on than others. For example, we know a lot about the Cuban Missile Crisis in the 1960s, a lot of research has been done on that, there are the times when the US government has been fairly transparent about incidents, but we know less about other periods and other countries as well. We don’t have incidents from China’s nuclear program, but that doesn’t mean there weren’t any, it just means it’s hard to figure out, and that scenario would be really interesting to do more research on.

Ariel: So, what was the threshold you were looking at to say, “Okay, I think this could have gone nuclear”?

Robert: Yeah, that’s a really good question. It’s somewhat hard to say. I think that a lot of these things are judgment calls. If you look at the history of incidents, I think a number of them have been blown a little bit out of proportion. As they’ve been retold, people like to say we came close to nuclear war, and that’s not always true. There are other incidents which are genuinely hair-raising and then there are some incidents that seem very minor, that you could say maybe it could have gotten to a nuclear war. But there was some safety incident on an Air Force Base and they didn’t follow procedures, and you could maybe tell yourself a story in which that led to a nuclear war, but at some point you make a judgment call and say, well, that doesn’t seem like a serious issue.

But it wasn’t like we have a really clear, well-defined line. In some ways, we’d like to broaden the data set so that we can include even smaller incidents just because the more incidents, the better as far as understanding, not the more incidents the better as far as being safe.

Ariel: Right. I’d like this question to go to both of you, as you were looking through these historical events, you mentioned that they were already public records so they’re not new per se, but were there any that surprised you, and which were one or two that you found the most hair-raising?

Robert: Well, I would say one that surprised me, and this may just be because of my ignorance of certain parts of geopolitical history, but there was an incident with the USS Liberty in the Mediterranean, in which the Israelis mistook it for an Egyptian destroyer and they decided to take it out, essentially, not realizing it was actually an American research vessel, and they did, and what happened was the US scrambled planes to respond. The problem was that most of the planes, or the ordinary planes they would have ordinarily scrambled, were out on some other sorties, some exercise, something like that, and they ended up scrambling planes which had a nuclear payload on them. These planes were recalled pretty quickly. They mentioned this to Washington and the Secretary of Defense got on the line and said, “No, recall those planes,” so it didn’t get that far necessarily, but I found it a really shocking incident because it was a friendly fire confusion, essentially, and there were a number of cases like that in which nuclear weapons were involved because they happened to be on equipment where they shouldn’t have been that was used to respond to some kind of a real or false emergency. That seems like a bigger issue than I would’ve at first expected, that just the fact that nuclear weapons are lying around somewhere where they could be involved with something.

Ariel: Wow, okay. And Seth?

Seth: Yeah. For me this was a really eye-opening experience. I had some familiarity with the history of incidents involving nuclear weapons, but there turned out to be much more that’s gone on over the years than I really had any sense for. Some of it is because I’m not a historian, this is not my specialty, but there were any number of events that it appears that the nuclear weapons were, at least may have been, seriously considered for use in a conflict.

Just to pick one example, in 1954 and 1955 was known as the first Taiwan Straits Crisis, and the second crisis, by the way, in 1958, also included plans for nuclear weapons use. But in the first one there were plans made up by the United States, the Joint Chiefs of Staff allegedly recommended that nuclear weapons be used against China if the conflict intensified and that President Eisenhower was apparently pretty receptive to this idea. In the end, there was a ceasefire negotiated so it didn’t come to that, but had that ceasefire not been made, my sense is that … The historical record is not clear on whether the US would’ve used nuclear weapons or not, maybe even the US leadership hadn’t made any final decisions on this matter, but there any number of these events, especially earlier in the years or decades after World War II when nuclear weapons were still relatively new, in which the use of nuclear weapons in conflict seemed to at least get a serious consideration that I might not have expected.

I’m accustomed to thinking of nuclear weapons as having a fairly substantial taboo attached to them, but I feel like the taboo has perhaps strengthened over the years, such that leadership now is less inclined to give the use of nuclear weapons serious consideration than it was back then. That may be mistaken, but that’s the impression that I get and that we may be perhaps more fortunate to have gotten through the first couple decades after World War II without an additional nuclear war. But it might be less likely at this time, though still not entirely impossible by any means.

Ariel: Are you saying that you think the risk is higher now?

Seth: I think the risk is probably higher now. I think I would probably say that the risk is higher now than it was, say, 10 years ago because various relations between nuclear armed states have gotten worse, certainly including between the United States and Russia, but whether the probability of nuclear war is higher now versus in, say, the ’50s or the ’60s, that’s much harder to say. That’s a degree of detail that I don’t think we can really comment on conclusively based on the research that we have at this point.

Ariel: Okay. In a little while I’m going to want to come back to current events and ask about that, but before I do that I want to touch first on the model itself, which lists four steps to a potential nuclear war: initiating the event, crisis, nuclear weapon use and full-scale nuclear war. Could you talk about what each of those four steps might be? And then I’m going to have follow-up questions about that next.

Seth: I can say a little bit about that. The model you’re describing is a model that was used by our colleague, Martin Hellman, in a paper that he did on the probability of nuclear war, and that was probably the first paper that develops the study of the probability of nuclear war using the sort of methodology that we use in this paper, which is to develop nuclear war scenarios.

So the four steps in this model are four steps to go from a period of calm into a full-scale nuclear war. His paper was looking at the probability of nuclear war based on an event that is similar to the Cuban Missile Crisis, and what’s distinctive about the Cuban Missile Crisis is we may have come close to going directly to nuclear war without any other type of conflicts in the first place. So that’s where the initiating event and the crisis in this model comes from, it’s this idea that there will be some of event that leads to a crisis, and the crisis will go straight to nuclear weapons use which could then scale to a full-scale nuclear war. The value of breaking it into those four steps is then you can look at each step in turn, think through the conditions for each of them to occur and maybe the probability of going from one step to the next, which you can use to evaluate the overall probability of that type of nuclear war. That’s for one specific type of nuclear war. Our paper then tries to scan across the full range of different types of nuclear war, different nuclear war scenarios, and put that all into one broader model.

Ariel: Okay. Yeah, your paper talks about 14 scenarios, correct?

Seth: That’s correct, yes.

Ariel: Okay, yeah. So I guess I have two questions for you: one, how did you come up with these 14 scenarios, and are there maybe a couple that you think are most worrisome?

Seth: So the first question we can definitely answer, we came up with them through our read of the nuclear war literature and our overall understanding of the risk and then iterating as we put the model together, thinking through what makes the most sense for how to organize the different types of nuclear war scenarios, and through that process, that’s how we ended up with this model.

As far as which ones seem to be the most worrisome, I would say a big question is whether we should be more worried about intentional versus accidental, or inadvertent nuclear war. I feel like I still don’t actually have a good answer to that question. Basically, should we be more worried about nuclear war that happens when a nuclear armed country decides to go ahead and start that nuclear war versus one where there’s some type of accident or error, like a false alarm or the detonation of a nuclear weapon that was not intended to be an act of war? I still feel like I don’t have a good sense for that.

Maybe the one thing I do feel is that it seems less likely that we would end up in a nuclear war from a detonation of a nuclear weapon that was not intentionally an act of war just because it feels to me like those events are less likely to happen. This would be nuclear terrorism or the accidental detonation of nuclear weapons, and even if it did happen it’s relatively likely that they would be correctly diagnosed as not being an act of war. I’m not certain of this. I can think of some reasons why maybe we should be worried about that type of scenario, but especially looking at the historical data it felt like those historical incidents were a bit more of a stretch, a bit further away from actually ending up in nuclear war.

Robert, I’m actually curious, your reaction to that, if you agree or disagree with that.

Robert: Well, I don’t think that non-state actors using a nuclear weapon is the big risk right now. But as far as whether it’s more likely that we’re going to get into a nuclear war through some kind of human error or a technological mistake, or whether it will be a deliberate act of war, I can think of scary things that have happened on both sides. I mean, the major thing that looms in one’s mind when you think about this is the Cuban Missile Crisis, and that’s an example of a crisis in which there were a lot of incidents during the course of that crisis where you think, well, this could’ve gone really badly, this could’ve gone the other way. So a crisis like that where tensions escalate and each country, or in this case the US and Russia, each thought the other might seriously threaten the homeland, I think are very scary.

On the other hand, there are incidents like the 1995 Norwegian rocket incident, which I find fairly alarming. In that incident, what happened was Norway was launching a scientific research rocket for studying the weather and had informed Russia that they were going to do this, but somehow that message hadn’t got passed along to the radar technicians, so the radar technician saw what looked like a submarine launched ballistic missile that could have been used to do an EMP, a burst over Russia which would then maybe take out radar and could be the first move in a full-scale attack. So this is scary because this got passed up the chain and supposedly, President Boris Yeltsin, it was Yeltsin at the time, actually activated the nuclear football in case he needed to authorize a response.

Now, we don’t really have a great sense how close anyone came to this, this is a little hyperbole after the fact, but this kind of thing seems like you could get there. And 1995 wasn’t a time of big tension between the US and Russia, so this kind of thing is also pretty scary and I don’t really know, I think that which risk you would find scarier depends a little bit on the current geopolitical climate. Right now, I might be most worried that the US would launch a bloody-nose attack against North Korea and North Korea would respond with a nuclear weapon, so it depends a little bit. I don’t know the answer either, I guess, is my answer.

Ariel: Okay. You guys brought up a whole bunch of things that I had planned to ask about, which is good. I mean, one of my questions had been are you more worried about intentional or accidental nuclear war, and I guess the short answer is, you don’t know? Is that fair to say?

Seth: Yeah, that’s pretty fair to say. The short answer is, at least at this time, they both seem very much worth worrying about.

As far as which one we should be more worried about, this is actually a very important detail to try to resolve for policy purposes because this speaks directly to how we should manage our nuclear weapons. For example, if we are especially worried about accidental or inadvertent nuclear war, then we should keep nuclear weapons on a relatively low launch posture. They should not be on hair-trigger alert because when things are on a high-alert status, it takes relatively little for the nuclear weapons to be launched and makes it easier for a mistake to lead to a launch. Versus if we are more worried about intentional nuclear war, then there may be some value to having them on a high-alert status in order to have a more effective deterrence in order to convince the other side to not launch their nuclear weapons. So this is an important matter to try resolving, but at this point, based on the research that we have so far, it remains, I think, somewhat ambiguous.

Ariel: I do want to follow up with that. Everything I’ve read, there doesn’t seem to be any benefit really to having things like our intercontinental ballistic missiles on hair-trigger alert, which are the ones that are on hair-trigger alert is my understanding, because submarines and the bombers still have the capability to strike back. Do you disagree with that?

Seth: I can’t say for sure whether or not I do disagree with that because it’s not something that I have looked at closely enough, so I would hesitate to comment on that matter. My general understanding is that hair-trigger alert is used as a means to enhance deterrence in order to make it less likely that either side would use their nuclear weapons in the first place, but regarding the specifics of it, that’s not something that I’ve personally looked at closely enough to really be able to comment on.

Robert: I think Seth’s right that it’s a question that needs more research in a lot of ways and that we shouldn’t answer it in the context of… We didn’t figure out the answer to that in this paper. I will say, I would personally sleep better if they weren’t on hair-trigger alert. My suspicion is that the big risk is not that one side launches some kind of decapitating first strike, I don’t think that’s really a very high risk, so I’m not as concerned as someone else might be about how well we need to deter that, how quickly we need to be able to respond. Whereas, I am very concerned about the possibility of an accident because… I mean, readings these incidents will make you concerned about it, I think. Some of them are really frightening. So that’s my intuition, but, as Seth says, I don’t think we really know. There’s more, at least in terms of this model, there’s more studying we need to do.

Seth: If I may, to one of your earlier questions regarding motivations for doing this research in the first place, I feel like to try giving more rigorous answers to some of these very basic nuclear weapons policy questions, like “should nuclear weapons be on hair-trigger alert, is that safer or more dangerous,” we can talk a little bit about what the trade-offs might be, but we don’t really have much to say about how that trade-off actually would be resolved. This is where I think that it’s important for the international security community to be trying harder to analyze the risks in these structured and, perhaps, even quantitative terms so that we can try to answer these questions more rigorously than just, this is my intuition, this is your intuition. That’s really, I think, one of the main values for doing this type of research is to be able to answer these important policy questions with more confidence and also perhaps, more consensus across different points of view than we would otherwise be able to have.

Ariel: Right. I had wanted to continue with some of the risk questions, but while we’re on the points that you’re making, Seth, what do you see moving forward with this paper? I mean, it was a bummer to read the paper and not get what the probabilities of nuclear war actually are, just a model for how we can get there, how do you see either you, or other organizations, or researchers, moving forward to start calculating what the probability could actually be?

Seth: The paper does not give us final answers for what the probability would be, but it definitely makes some important steps in that direction. Additional steps that can be taken would include things like exploring the historical incidence data set more carefully to check to see if there may be important incidents that have been missed, to see for each of the incidents how close do we really think that that came to nuclear war? And this is something that the literature on these incidents actually diverges on. There are some people who look at these incidents and see them as being really close calls, other people look at them and see them as being evidence that the system works as it should, that, sure, there were some alarms but the alarms were handled the way that they should be handled and that the tools are in place to make sure that those don’t end in nuclear war. So exactly how close these various incidents got is one important way forward towards quantifying the probability.

Another one is to come up with some sense for what the actual population of historical incidences relative to the data set that we have, we are presumably missing some number of historical incidents, some of them might be smaller and less important, but there might be some big ones that maybe they happened and we don’t know about it because they are only in literatures in other languages, we only did research in English, or because all of the evidence about them is classified government records by whichever governments were involved in the incident, and so we need to-

Ariel: Actually, I do actually want to interrupt with a question real quick there, and my apologies for not having read this closer, I know there were incidents involving the US, Russia, and I think you guys had some about Israel. Were there incidents mentioning China or any of the European countries that have nuclear weapons?

Seth: Yeah, I think there were probably incidents involving all of the nuclear armed countries, certainly involving China. For example, China had a war with the Soviet Union over their border some years ago and there was at least some talk of nuclear weapons involved in that. Also, the one I mentioned earlier, the Taiwan Straits Crises, those involved China. Then there were multiple incidents between India and Pakistan, especially regarding the situation in Kashmir. With France, I believe we included one incident in which a French nuclear bomber got a faulty signal to take off in combat and then it was eventually recalled before it got too far. There might’ve been something with the UK also. Robert, do you recall if there were any with the UK?

Robert: Yes, there was, during the Falklands war, apparently, they left with nuclear depth charges. It’s actually not really, honestly clear to me why you would use a nuclear depth charge, but there’s not any evidence they ever intended to use them but they sent out nuclear armed ships, essentially, to deal with a crisis in the Falklands.

There’s also, I think, an incident in South Africa as well when South Africa was briefly a nuclear state.

Ariel: Okay. Thanks. It’s not at all disturbing.

Robert: It’s very disturbing. I will say, I think that China is the one we know the least about. Some of the incidents that Seth mentioned with China, the danger or the nuclear armed power that might have used nuclear weapons was the United States. So there is the Soviet-China incident, but we don’t really know a lot about the Chinese program and Chinese incidents. I think some of that is because it’s not reported in English and to some extent it’s also that it’s classified and the Chinese are not as open about what’s going on.

Seth: Yeah, the Chinese are definitely much, much less transparent than the United States, as are the Russians. I mean, the United States might be the most transparent out of all of the nuclear armed countries.

I remember some years ago when I was spending time at the United Nations I got the impression that the Russians and the Chinese were actually not quite sure what to make of the Americans’ transparency, that they found it hard to believe that the US government was not just putting out loads of propaganda and misinformation that it didn’t make sense to them that we just actually put out a lot of honest data about government activities here, and that’s just the standard and that you can actually trust this information, this data. So yeah, we may be significantly underestimating the number of incidents involving China and perhaps Russia and other countries because their governments are less transparent.

Ariel: Okay. That definitely addresses a question that I had, and my apologies for interrupting you earlier.

Seth: No, that’s fine. But this is one aspect of the research that still remains to be done that would help us figure out what the probabilities might be. It would be a mistake to just calculate them based on the data set as it currently stands, because this is likely to be only a portion of the actual historical incidents that may have ended in nuclear war.

So these are the sorts of details and nuances that were, unfortunately, beyond the scope of the project that we were able to do, but it would be important work for us or other research groups to do to take us closer to having good probability estimates.

Ariel: Okay. I want to ask a few questions that, again, are probably going to be you guys guessing as opposed to having good, hard information, and I also wanted to touch a little bit on some current events. So first, one of the things that I hear a lot is that if a nuclear war is going to happen, it’s much more likely to happen between India and Pakistan than, say, the US and Russia or US and … I don’t know about US and North Korea at this point, but I’m curious what your take on that is, do you feel that India and Pakistan are actually the greatest risk or do you think that’s up in the air?

Robert: I mean, it’s a really tough question. I would say that India and Pakistan is one of the scariest situations for sure. I don’t think they have actually come that close, but it’s not that difficult to imagine a scenario in which they would. I mean, these are nuclear powers that occasionally shoot at each other across the line of control, so I do think that’s very scary.

But I also think, and this is an intuition, this isn’t a conclusion that we have from the paper, but I also think that the danger of something happening between the United States and Russia is probably underestimated, because we’re not in the Cold War anymore, relations aren’t necessarily good, it’s not clear what relations are, but people will say things like, “Well, neither side wants a war.” Obviously neither side wants a war, but I think there’s a danger of the kind of inadvertent escalation, miscalculation, and that hasn’t really gone away. So that’s something I think is probably not given enough attention. I’m also concerned about the situation in North Korea. I think that that is now an issue which we have to take somewhat seriously.

Seth: I think the last five years or so have been a really good learning opportunity for all of us on these matters. I remember having conversations with people about this, maybe five years ago, and they thought the thought of a nuclear war between the United States and Russia was just ridiculous, that that’s antiquated Cold War talk, that the world has changed. And they were right and their characterization of the world as it was at that moment, but I was always uncomfortable with that because the world could change again. And sure enough, in the last five years, the world has changed very significantly that I think most people would agree makes the probability of nuclear war between the United States and Russia substantially higher than it was five years ago, especially starting with the Ukraine crisis.

There’s also just a lot of basic volatility in the international system that I think is maybe underappreciated, that we might like to think of it as being more deterministic, more logical than it actually is. The classic example is that World War I maybe almost didn’t happen, that it only happened because a very specific sequence of events happened that led to the assassination of Archduke Ferdinand and had that gone a little bit differently, he wouldn’t have been assassinated and World War I wouldn’t have happened and the world we live in now would be very different than what it is. Or, to take a more recent example, it’s entirely possible that had the 2016 FBI director not made an unusual decision regarding the disclosure of information regarding one candidate’s emails a couple weeks before the election, the outcome of the 2016 US election might’ve gone different and international politics would look quite different than it is right now. Who knows what will happen next year or the year after that.

So I think we can maybe make some generalizations about which conflicts seem more likely or less likely, especially at the moment, but we should be really cautious about what we think it’s going to be overall over 5, 10, 20, 30 year periods just because things really can change substantially in ways that may be hard to see in advance.

Robert: Yeah, for me, one of the lessons of World War I is not so much that it might not have happened, I think it probably would have anyway — although Seth is right, things can be very contingent — but it’s more that nobody really wanted World War I. I mean, at the time people thought it wouldn’t happen because it was sort of bad for everyone and no one thought, “Well, this is in our interest to pursue it,” but wars can happen that way where countries end up thinking, for one reason or another, they need to go, they need to do one thing or another that leads to war when in fact everyone would prefer to have gotten together and avoided it. It’s suboptimal equilibrium. So that’s one thing.

The other thing is that, as Seth says, things change. I’m not that concerned about what’s going on in the week that we’re recording this, but we had this week the Russian ambassador saying he would shoot down US missiles aimed at Syria and the United States’ president responding on Twitter, that they better get ready for his smart missiles. This is, I suspect, won’t escalate to a nuclear war. I’m not losing that much asleep about it. But this is the kind of thing that you would like to see a lot less of, this is the kind of thing that’s worrying and maybe you wouldn’t have anticipated this 10 years ago.

Seth: When you say you’re not losing much sleep on this, you’re speaking as someone who has, as I understand, it very recently, actually, literally lost sleep over the threat of nuclear war, correct?

Robert: That’s true. I was woken up early in the morning by an alert saying a ballistic missile was coming to my state, and that was very upsetting.

Ariel: Yes. So we should clarify, Robert lives in Hawaii.

Robert: I live in Hawaii. And because I take the risk of nuclear war seriously, I might’ve been more upset than some people, although I think that a large percentage of the population of Hawaii thought to themselves, “Maybe I’m going to die this morning. In fact, maybe, my family’s going to die and my neighbors and the people at the coffee shop, and our cats and the guests who are visiting us,” and it really brought home the danger, not that it should be obvious that nuclear war is unthinkable but when you actually face the idea … I also had relatively recently read Hiroshima, John Hersey’s account of, really, most of the aftermath of the bombing of Hiroshima, and it was easy to put myself in that and say, “Well, maybe I will be suffering from burns or looking for clean water,” and of course, obviously, again, none of us deserve it. We may be responsible for US policy in some way because the United States is a democracy, but my friends, my family, my cat, none of us want any part of this. We don’t want to get involved in a war with North Korea. So this really, I’d say, it really hit home.

Ariel: Well, I’m sorry you had to go through that.

Robert: Thank you.

Ariel: I hope you don’t have to deal with it again. I hope none of us have to deal with that.

I do want to touch on what you’ve both been talking about, though, in terms of trying to determine the probability of a nuclear war over the short term where we’re all saying, “Oh, it probably won’t happen in the next week,” but in the next hundred years it could. How do you look at the distinction in time in terms of figuring out the probability of whether something like this could happen?

Seth: That’s a good technical question. Arguably, we shouldn’t be talking about the probability of nuclear war as one thing. If anything, we should talk about the rate, or the frequency of it, that we might expect. If we’re going to talk about the probability of something, that something should be a fairly specific distinct event. For example, an example we use in the paper, what’s the probability of a given team, say, the Cleveland Indians, winning the World Series? It’s good to say what’s the probability of them winning the World Series in, say, 2018, but to say what’s the probability of them winning the World Series overall, well, if you wait long enough, even the Cleveland Indians will probably eventually win the World Series as long as they continue to play them. When we wrote the paper we actually looked it up, and it said that they have about a 17% chance of winning the 2018 World Series even though they haven’t won a World Series since like 1948. Poor Cleveland- sorry, I’m from Pittsburgh so I get to gloat a little bit.

But yeah, we should distinguish between saying what is the probability of any nuclear war happening this week or this year, versus how often we might expect nuclear wars to occur or what the total probability of any nuclear war happening over a century or whatever time period it might be.

Robert: Yeah. I think that over the course of the century, I mean, as I say, I’m probably not losing that much sleep on any given week, but over the course of a century if there’s a probability of something really catastrophic, you have to do everything you can to try to mitigate that risk.

I think, honestly, some terrible things are going to happen in 21st century. I don’t know what they are, but that’s just how life is. I don’t know which things they are. Maybe it will involve a nuclear war of some kind. But you can also differentiate among types of nuclear war. If one nuclear bomb is used in anger in the 21st century, that’s terrible, but wouldn’t be all that surprising or mean the destruction of the human race. But then there are the kinds nuclear wars that could potentially trigger a nuclear winter by kicking so much soot up into the atmosphere and blocking out the sun, and might actually threaten not just the people who were killed in the initial bombing, but the entire human race. That is something we need to look at, in some sense, even more seriously, even though the chance of that is probably a fair amount smaller than the chance of one nuclear weapon being used. Not that one nuclear weapon being used wouldn’t be an incredibly catastrophic event as well, but I think with that kind of risk you really need to be very careful to try to minimize it as much possible.

Ariel: Real quick, I got to do a podcast with Brian Toon and Alan Robock a little while ago on nuclear winter, so we’ll link to that in the transcript for anyone who wants to learn about nuclear winter, and you brought up a point that I was also curious about, and that is: what is the likelihood, do you guys think, of just one nuclear weapon being used and limited retaliation? Do you think that is actually possible or do you think if a nuclear weapon is used, it’s more likely to completely escalate into full-scale nuclear war?

Robert: I personally do think that’s possible because I think a number of the scenarios that would involve using a nuclear weapon or not between the United States and Russia, or even the United States and China, so I think that some scenarios involve a few nuclear weapons. If it were an incident with North Korea, you might worry that it would spread to Russia or China, but you can also see a scenario in which North Korea uses one or two nuclear weapons. Even with India and Pakistan, they don’t necessarily, I wouldn’t think they would necessarily, use all — what do they have each, like a hundred or so nuclear weapons — I wouldn’t necessarily assume they would use them all. So there are scenarios in which just one or a few nuclear weapons would be used. I suspect those are the most likely scenarios, but it’s really hard to know. We don’t know the answer to that question.

Seth: There are even scenarios between the United States and Russia that involve one or just a small number of nuclear weapons, and the Russian military has the concept of the de-escalatory nuclear strike, which is the idea that if there is a major conflict that is emerging and might not be going in a favorable way for Russia, especially since their conventional military is not as strong as ours, that they may use a single nuclear weapon, basically, to demonstrate their seriousness on the matter in hopes of persuading us to back down. Now, whether or not we would actually back down or escalate it into an all-out nuclear war, I don’t think that’s something that we can really know in advance, but it’s at least plausible. It’s certainly plausible that that’s what would happen and presumably, Russia considers this plausible which is why they talk about it in the first place. Not to just point fingers at Russia, this is essentially the same thing the NATO had in the earlier point in the Cold War when the Soviet Union had the larger conventional military and our plan was to use nuclear weapons in a limited basis in order to prevent the Soviet Union from conquering Western Europe with their military, so it is possible.

I think this is one of the biggest points of uncertainty for the overall risk, is if there is an initial use of nuclear weapons, how likely is it that additional nuclear weapons are used and how many and in what ways? I feel like despite having studied this a modest amount, I don’t really have a good answer to that question. This is something that may be hard to figure out in general because it could ultimately depend on things like the personalities involved in that particular conflict, who the political and military leadership are and what they think of all of this. That’s something that’s pretty hard for us as outside analysts to characterize. But I think, both possibilities, either no escalation or lots of escalation, are possible as is everything in between.

Ariel: All right, so we’ve gone through most of the questions that I had about this paper now, thank you very much for answering those. You guys have also published a working paper this month called A Model for the Impacts of Nuclear War, but I was hoping you could maybe give us a quick summary of what is covered in that paper and why we should read it.

Seth: Risk overall is commonly quantified as the probability of some type of event multiplied by the severity of the impacts. So our first paper was on the probability side, this one’s on the impact side, and it scans across the full range of different types of impacts that nuclear war could have looking at the five major impacts of nuclear weapons detonation, which is thermal radiation, blast, ionizing radiation, electromagnetic pulse and then finally, human perceptions, the ways that the detonation affects how people think and in turn, how we act. We, in this paper, built out a pretty detailed model that looks at all of the different details, or at least a lot of the various details, of what each of those five effects of nuclear weapons detonations would have and what that means in human terms.

Ariel: Were there any major or interesting findings from that that you want to share?

Seth: Well, the first thing that really struck me was, “Wow, there are a lot of ways of being killed by nuclear weapons.” Most of the time when we think about nuclear detonations and how you can get killed by them, you think about, all right, there’s the initial explosion and whether it’s the blast itself or the buildings falling on you, or the fire, it might be the fire, or maybe it’s a really high dose of radiation that you can get if you’re close enough to the detonation, that’s probably how you can die. In our world of talking about global catastrophic risks, we also will think about the risk of nuclear winter and in particular, the effect that that can have on global agriculture. But there’s a lot of other things that can happen too, especially related to the effect on physical infrastructure, or I should say civil infrastructure, roads, telecommunications, the overall economy when cities are destroyed in the war, those take out potentially major nodes in the global economy that can have any number of secondary effects, among other things.

It’s just a really wide array of effects, and that’s one thing that I’m happy for with this paper is that for, perhaps, the first time, it really tries to lay out all of these effects in one place and in a model form that can be used for a much more complete accounting of the total impact of nuclear war.

Ariel: Wow. Okay. Robert, was there anything you wanted to add there?

Robert: Well, I agree with Seth, it’s astounding what the range, the sheer panoply of bad things that could happen, but I think that once you get into a situation where cities are being destroyed by nuclear weapons, or really anything being destroyed by nuclear weapons, it can unpredictable really fast. You don’t know the effect on the global system. A lot of times, I think, when you talk about catastrophic risk, you’re not simply talking about the impact of the initial event, but the long-term consequences it could have — starting more wars, ongoing famines, a shock to the economic system that can cause political problems, so these are things that we need to look at more. I mean, it would be the same with any kind of thing we would call a catastrophic risk. If there were a pandemic disease, the main concern might not be the pandemic disease would wipe out everyone, but that the aftermath would cause so many problems that it would be difficult to recover from. I think that would be the same issue if there were a lot of nuclear weapons used.

Seth: Just to follow up on that, some important points here, one is that the secondary effects are more opaque. They’re less clear. It’s hard to know in advance what would happen. But then the second is the question of how much we should study them. A lot of people look at the secondary effect and say, “Oh, it’s too hard to study. It’s too unclear. Let’s focus our attention on these other things that are easier to study.” And maybe there’s something to be said for that where if there’s really just no way of knowing what might happen, then we should at least focus on the part that we are able to understand. I’m not convinced that that’s true, maybe it is, but I think it’s worth more effort than there has been to try to understand the secondary effects, see what we can say about them. I think there are a number of things that we can say about them. The various systems are not completely unknown, they’re the systems that we live in now and we can say at least a few intelligent things about what might happen to those after a nuclear war or after other types of events.

Ariel: Okay. My final question for both of you then is, as we’re talking about all these horrible things that could destroy humanity or at the very least, just kill and horribly maim way too many people, was there anything in your research that gave you hope?

Seth: That’s a good question. I feel like one thing that gave me some hope is that, when I was working on the probability paper, it seemed that at least some of the events and historical incidents that I had been worried about might not have actually come as close to nuclear war as I previously thought they had. Also, a lot of the incidents were earlier within, say, the ’40s, ’50s, ’60s, and less within the recent decades. That gave me some hope that maybe things are moving in the right direction.

But the other is that as you lay out all the different elements of both the probability and the impacts and see it in full how it all works, that really often points to opportunities that may be out there to reduce the risk and hopefully, some of those opportunities can be taken.

Robert: Yeah, I’d agree with that. I’d say there were certainly things in the list of historical incidents that I found really frightening, but I also thought that in a large number of incidents, the system, more or less, worked the way it should have, they caught the error of whatever kind it was and fixed it quickly. It’s still alarming, I still would like there not to be incidents, and you can imagine that some of those could’ve not been fixed, but they were not all as bad as I had imagined at first. So that’s one thing.

I think the other thing is, and I think Seth you were sort of indicating this, there’s something we can do, we can think about how to reduce the risk, and we’re not the only ones doing this kind of work. I think that people are starting to take efforts to reduce the risk of really major catastrophes more seriously now, and that kind of work does give me hope.

Ariel: Excellent. I’m going to end on something that … It was just an interesting comment that I heard recently, and that was: Of all the existential risks that humanity faces, nuclear weapons actually seem the most hopeful because there’s something that we can so clearly do something about. If we just had no nuclear weapons, nuclear weapons wouldn’t be a risk, and I thought that was an interesting way to look at it.

Seth: I can actually comment on that idea. I would add that you would need not just to not have any nuclear weapons, but also not have the capability to make new nuclear weapons. There is some concern that if there aren’t any nuclear weapons, then in a crisis there may be a rush to build some in order to give that side the advantage. So in order to really eliminate the probability of nuclear war, you would need to eliminate both the weapons themselves and the capacity to create them, and you would probably also want to have some monitoring measures so that the various countries had confidence that the other sides weren’t cheating. I apologize for being a bit of a killjoy on that one.

Robert: I’m afraid you can’t totally reduce the risk of any catastrophe, but there are ways we can mitigate the risk of nuclear war and other major risks too. There’s work that can be done to reduce the risk.

Ariel: Okay, let’s end on that note. Thank you both very much!

Seth: Yeah. Thanks for having us.

Robert: Thanks, Ariel.

Ariel: If you’d like to read the papers discussed in this podcast or if you want to learn more about the threat of nuclear weapons and what you can do about it, please visit futureoflife.org and find this podcast on the homepage, where we’ll be sharing links in the introduction.

[end of recorded material]

Podcast: Inverse Reinforcement Learning and Inferring Human Preferences with Dylan Hadfield-Menell

Inverse Reinforcement Learning and Inferring Human Preferences is the first podcast in the new AI Alignment series, hosted by Lucas Perry. This series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across a variety of areas, such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we will hope that you join in the conversations by following or subscribing to us on Youtube, Soundcloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary map which begins to map this space.

In this podcast, Lucas spoke with Dylan Hadfield-Menell, a fifth year Ph.D student at UC Berkeley. Dylan’s research focuses on the value alignment problem in artificial intelligence. He is ultimately concerned with designing algorithms that can learn about and pursue the intended goal of their users, designers, and society in general. His recent work primarily focuses on algorithms for human-robot interaction with unknown preferences and reliability engineering for learning systems. 

Topics discussed in this episode include:

  • Inverse reinforcement learning
  • Goodhart’s Law and it’s relation to value alignment
  • Corrigibility and obedience in AI systems
  • IRL and the evolution of human values
  • Ethics and moral psychology in AI alignment
  • Human preference aggregation
  • The future of IRL
In this interview we discuss a few of Dylan’s papers and ideas contained in them. You can find them here: Inverse Reward Design, The Off-Switch Game, Should Robots be Obedient, and Cooperative Inverse Reinforcement Learning.  You can hear about these papers above or read the transcript below.

 

Lucas: Welcome back to the Future of Life Institute Podcast. I’m Lucas Perry and  I work on AI risk and nuclear weapons risk related projects at FLI. Today, we’re kicking off a new series where we will be having conversations with technical and nontechnical researchers focused on AI safety and the value alignment problem. Broadly, we will focus on the interdisciplinary nature of the project of eventually creating value-aligned AI. Where what value-aligned exactly entails is an open question that is part of the conversation.

In general, this series covers the social, political, ethical, and technical issues and questions surrounding the creation of beneficial AI. We’ll be speaking with experts from a large variety of domains, and hope that you’ll join in the conversations. If this seems interesting to you, make sure to follow us on SoundCloud, or subscribe to us on YouTube for more similar content.

Today, we’ll be speaking with Dylan Hadfield Menell. Dylan is a fifth-year PhD student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. With that, I give you Dylan. Hey, Dylan. Thanks so much for coming on the podcast.

Dylan: Thanks for having me. It’s a pleasure to be here.

Lucas: I guess, we can start off, if you can tell me a little bit more about your work over the past years. How have your interests and projects evolved? How has that led you to where you are today?

Dylan: Well, I started off towards the end of undergrad and beginning of my PhD working in robotics and hierarchical robotics. Towards the end of my first year, my advisor came back from a sabbatical, and started talking about the value alignment problem and existential risk issues related to AI. At that point, I started thinking about questions about misaligned objectives, value alignment, and generally how we get the correct preferences and objectives into AI systems. About a year after that, I decided to make this my central research focus. Then, for the past three years, that’s been most of what I’ve been thinking about.

Lucas: Cool. That seems like you had an original path where you’re working on practical robotics. Then, you shifted more into value alignment and AI safety efforts.

Dylan: Yeah, that’s right.

Lucas: Before we go ahead and jump into your specific work, it’d be great if we could go ahead and define what inverse reinforcement learning exactly is. For me, it seems that inverse reinforcement learning, at least, from the view, I guess, of technical AI safety researchers is it’s viewed as an empirical means of conquering descriptive ethics where by like we’re able to give a clear descriptive account of what any given agents’ preferences and values are at any given time is. Is that a fair characterization?

Dylan: That’s one way to characterize it. Another way to think about it, which is a usual perspective for me, sometimes, is to think of inverse reinforcement learning as a way of doing behavior modeling that has certain types of generalization properties.

Any time you’re learning in any machine learning context, there’s always going to be a bias that controls how you generalize a new information. Inverse reinforcement learning and preference learning, to some extent, is a bias in behavior modeling, which is to say that we should model this agent as accomplishing a goal, as satisfying a set of preferences. That leads to certain types of generalization properties and new environments. For me, inverse reinforcement learning is building in this agent-based assumption into behavior modeling.

Lucas: Given that, I’d like to dive more into the specific work that you’re working on and going to some summaries of your findings and your research that you’ve been up to. Given this interest that you’ve been developing in value alignment, and human preference aggregation, and AI systems learning human preferences, what are the main approaches that you’ve been working on?

Dylan: I think the first thing that really Stuart Russell and I started thinking about was trying to understand theoretically, what is a reasonable goal to shoot for, and what does it mean to do a good job of value alignment. To us, it feels like issues with misspecified objectives, at least, in some ways, are a bug in the theory.

All of the math around artificial intelligence, for example, Markov decision processes, which is the central mathematical model we use for decision making over time, starts with an exogenously defined objective or word function. We think that, mathematically, that was a fine thing to do in order to make progress, but it’s an assumption that really has put blinders on the field about the importance of getting the right objective down.

I think, the first thing that we sought to try to do was to understand, what is a system or a set up for AI that does the right thing in theory, at least. What’s something that if we were able to implement this that we think could actually work in the real world with people. It was that kind of thinking that led us to propose cooperative inverse reinforcement learning, which was our attempt to formalize the interaction whereby you communicate an objective to the system.

The main thing that we focused on was including within the theory a representation of the fact that the true objective’s unknown and unobserved, and that it needs to be arrived at through observations from a person. Then, we’ve been trying to investigate the theoretical implications of this modeling shift.

In the initial paper that we did, which is titled Cooperative Inverse Reinforcement Learning, what we looked at is how this formulation is actually different from a standard environment model in AI. In particular, the way that it’s different is there’s strategic interaction on the behalf of the person. The way that you observe what you’re supposed is doing is intermediated by a person who may be trying to actually teach or trying to communicate appropriately. What we showed is that modeling this communicative component can actually be hugely important and lead to much faster learning behavior.

In our subsequent work, what we’ve looked at is taking this formal model in theory and trying to apply it to different situations. There are two really important pieces of work that I like here that we did. One was to take that theory and use it to explicitly analyze a simple model of an existential risk setting. This was a paper titled The Off-Switch Game that we published at IJCAI last summer. What it was, was working through a formal model of a corrigibility problem within a CIRL (cooperative inverse reinforcement learning) framework. It shows the utility of constructing this type of game in the sense that we get some interesting predictions and results.

The first one we get is that there are some nice simple necessary conditions for the system to want to let the person turn it off, which is that the robot, the AI system needs to have uncertainty about its true objective, which is to say that it needs to have within its belief the possibility that it might be wrong. Then, all it needs to do is believe that the person it’s interacting with is a perfectly rational individual. If that’s true, you’d get a guarantee that this robot always lets the person switch it off.

Now, that’s good because, in my mind, it’s an example of a place where, at least, in theory, it solves the problem. This gives us a way that theoretically, we could build corrigible systems. Now, it’s still making a very, very strong assumption, which is that it’s okay to model the human as being optimal or rational. I think if you look at real people, that’s just not a fair assumption to make for a whole host of reasons.

The next thing we did in that paper is we looked at this model. What we realized is that adding in a small amount of irrationality breaks this requirement. It means that some things might actually go wrong. The final thing we did in the paper was to look at the consequences of either overestimating or underestimating human rationality. The argument that we made is there’s a trade off between assuming that the person is more rational. It lets you get more information from their behavior, thus learn more, and in principle help them more. If you assume that they’re too rational, then this actually can lead to quite bad behavior.

There’s a sweet spot that you want to aim for, which is to maybe try to underestimate how rational people are, but you, obviously, don’t want to get it totally wrong. We followed up on that idea in a paper with Smitha Milli as the first author that was titled Should Robots be Obedient? And that tried to get a little bit more of this trade off between maintaining control over a system and the amount of value that it can generate for you.

We looked at the implication that as robot systems interact with people over time, you expect them to learn more about what people want. If you get very confident about what someone wants, and you think they might be irrational, the math in the Off-Switch paper predicts that you should try to take control away from them. This means that if your system is learning over time, you expect that even if it is initially open to human control and oversight, it may lose that incentive over time. In fact, you can predict that it should lose that incentive over time.

In Should Robots be Obedient, we modeled that property and looked at some consequences of it. We do find that you got a basic confirmation of this hypothesis, which is that systems that maintain human control and oversight have less value that they can achieve in theory. We also looked at what happens when you have the wrong model. If the AI system has a prior that the human cares about a small number of things in the world, let’s say, then it statistically gets overconfident in its estimates of what people care about, and disobeys the person more often than it should.

Arguably, when we say we want to be able to turn the system off, it’s less a statement about what we want to do in theory or the property of the optimal robot behavior we want, and more of a reflection of the idea that we believe that under almost any realistic situation, we’re probably not going to be able to fully explain all of the relevant variables that we care about.

If you’re giving your robot an objective to find over a subset of things you care about, you should actually be very focused on having it listen to you, more so than just optimizing for its estimates of value. I think that provides, actually, a pretty strong theoretical argument for why corrigibility is a desirable property in systems, even though, at least, at face value, it should decrease the amount of utility those systems can generate for people.

The final piece of work that I think I would talk about here is our NIPS paper from December, which is titled Inverse Reward Design. That was taking cooperative inverse reinforcement learning and pushing it in the other direction. Instead of using it to theoretically analyze very, very powerful systems, we can also use it to try to build tools that are more robust to mistakes that designers may make. And start to build in initial notions of value alignment and value alignment strategies into the current mechanisms we use to program AI systems.

What that work looked at was understanding the uncertainty that’s inherent in an objective specification. In the initial cooperative inverse reinforcement learning paper and the Off-Switch Game, we said is that AI systems should be uncertain about their objective, and they should be designed in a way that is sensitive to that uncertainty.

This paper was about trying to understand, what is a useful way to be uncertain about the objective. The main idea behind it was that we should be thinking about the environments that system designer had in mind. We use an example of a 2D robot navigating in the world, and the system designer is thinking about this robot navigating where there’s three types of terrains. There’s grass, there’s gravel, and there’s gold. You can give your robot an objective, a utility function to find over being in those different types of terrain that incentivizes it to go and get the gold, and stay on the dirt where possible, but to take shortcuts across the grass when it’s high value.

Now, when that robot goes out into the world, there are going to be new types of terrain, and types of terrain the designer didn’t anticipate. What we did in this paper was to build an uncertainty model that allows the robot to determine when it should be uncertain about the quality of its reward function. How can we figure out when the reward function that a system designer builds into an AI, how can we determine when that objective is ill-adapted to the current situation? You can think of this as a way of trying to build in some mitigation to Goodhart’s law.

Lucas: Would you like to take a second to unpack what Goodhart’s law is?

Dylan: Sure. Goodhart’s law is an old idea in social science that actually goes back to before Goodhart. I would say that in economics, there’s a general idea of the principal agent problem, which dates back to the 1970s, as I understand it, and basically looks at the problem of specifying incentives for humans. How should you create contracts? How do you create incentives, so that another person, say, an employee, helps earn you value?

Goodhart’s law is a very nice way of summarizing a lot of those results, which is to say that once a metric becomes an objective, it ceases to become a good metric. You can have properties of the world, which correlate well with what you want, but optimizing for them actually leads to something quite, quite different than what you’re looking for.

Lucas: Right. Like if you are optimizing for test scores, then you’re not actually going to end up optimizing for intelligence, which is what you wanted in the first place?

Dylan: Exactly. Even though test scores, when you weren’t optimizing for them were actually a perfectly good measure of intelligence. I mean, not perfectly good, but were an informative measure of intelligence. Goodhart’s law, arguably, is a pretty bleak perspective. If you take it seriously, and you think that we’re going to build very powerful systems that are going to be programmed directly through an objective, in this manner, Goodhart’s law should be pretty problematic because any objective that you can imagine programming directly into your system is going to be something correlated with what you really want rather than what you really want. You should expect that that will likely be the case.

Lucas: Right. Is it just simply too hard or too unlikely that we’re able to sufficiently specify what exactly that we want that we’ll just end up using some other metrics that if you optimize too hard for them, it ends up messing with a bunch of other things that we care about?

Dylan: Yeah. I mean, I think there’s some real questions about, what is it we even mean… Well, what are we even trying to accomplish? What should we try to program into systems? Philosophers have been trying to figure out those types of questions for ages. For me, as someone who takes a more empirical slant on these things, I think about the fact that the objectives that we see within our individual lives are so heavily shaped by our environments. Which types of signals we respond to and adapt to has heavily adapted itself to the types of environments we find ourselves in.

We just have so many examples of objectives not being the correct thing. I mean, effectively, all you could have is correlations. The fact that wire heading is possible, is maybe some of the strongest evidence for Goodhart’s law being really a fundamental property of learning systems and optimizing systems in the real world.

Lucas: There are certain agential characteristics and properties, which we would like to have in our AI systems, like them being-

Dylan: Agential?

Lucas: Yeah. Corrigibility is a characteristic, which you’re doing research on and trying to understand better. Same with obedience. It seems like there’s a trade off here where if a system is too corrigible or it’s too obedient, then you lose its ability to really maximize different objective functions, correct?

Dylan: Yes, exactly. I think identifying that trade off is one of the things I’m most proud of about some of the work we’ve done so far.

Lucas: Given AI safety and really big risks that can come about from AI, in the short, to medium, and long term, before we really have AI safety figured out, is it really possible for systems to be too obedient, or too corrigible, or too docile? How do we navigate this space and find sweet spots?

Dylan: I think it’s definitely possible for systems to be too corrigible or too obedient. It’s just that the failure mode for that doesn’t seem that bad. If you think about this-

Lucas: Right.

Dylan: … it’s like Clippy. Clippy was asking for human-

Lucas: Would you like to unpack what Clippy is first?

Dylan: Sure, yeah. Clippy is an example of an assistant that Microsoft created in the ’90s. It was this little paperclip that would show up in Microsoft Word. Well, it liked to suggest that you’re trying to write a letter a lot and ask for different ways in which it could help.

Now, on one hand, that system was very corrigible and obedient in the sense that it would ask you whether or not you wanted its help all the time. If you said no, it would always go away. It was super annoying because it would always ask you if you wanted help. The false positive rate was just far too high to the point where the system became really a joke in computer science and AI circles of what you don’t want to be doing. I think, systems can be too obedient or too sensitive to human intervention and oversight in the sense that too much of that just reduces the value of the system.

Lucas: Right, for sure. On one hand, when we’re talking about existential risks or even a paperclip maximizer, then it would seem, like you said, like the failure mode of just being too annoying and checking in with us too much seems like not such a bad thing given existential risk territory.

Dylan: I think if you’re thinking about it in those terms, yes. I think if you’re thinking about it from the standpoint of, “I want to sell a paperclip maximizer to someone else,” then it becomes a little less clear, I think, especially, when the risks of paperclip maximizers are much harder to measure. I’m not saying that it’s the right decision from a global altruistic standpoint to be making that trade off, but I think it’s also true that just if we think about the requirements of market dynamics, it is true that AI systems can be too corrigible for the market. That is a huge failure mode that AI systems run into, and it’s one we should expect the producers of AI systems to be responsive to.

Lucas: Right. Given all these different … Is there anything else you wanted to touch on there?

Dylan: Well, I had another example of systems are too corrigible-

Lucas: Sure.

Dylan: … which is, do you remember Microsoft’s Tay?

Lucas: No, I do not.

Dylan: This is a chatbot that Microsoft released. They trained it based off of tweets. It was a tweet bot. They trained it based on things that were proven at it. I forget if it was the nearest neighbors’ lookup or if it was just doing a neural method, and over fitting, and memorizing parts of the training set. At some point, 4chan  realized that the AI system, that Tay, was very suggestible. They basically created an army to radicalize Tay. They succeeded.

Lucas: Yeah, I remember this.

Dylan: I think you could also think of that as being the other axis of too corrigible or too responsive to human input. The first access I was talking about is the failures of being too corrigible from an economic standpoint, but there’s also the failures of being too corrigible in a multi agent mechanism design setting where, I believe, that those types of properties in a system also open them up to more misuse.

If we think of AI, cooperative inverse reinforcement learning and the models we’ve been talking about so far exist in what I would call the one robot one human model of the world. Generally, you could think of extensions of this with N humans and M robots. The variance of what you would have there, I think, lead to different theoretical implications.

If we think of just two humans, N=2, and one robot, M=1, supposed that one of the humans is the system designer and the other one is the user, there is this trade off between how much control the system designer has over the future behavior of the system and how responsive and corrigible it is to the user in particular. Trading off between those two, I think, is a really interesting ethical question that comes up when you start to think about misuse.

Lucas: Going forward and as we’re developing these systems, and trying to make them more fully realized in the world where the number of people will equal something like seven or eight billion, how do we navigate this space where we’re trying to hit a sweet spot where it’s corrigible in the right ways into the right degree, and right level, and to the right people, and it is obedient to the right people, and it’s not suggestible from the wrong people, or is that just like enter a territory of so many political, social, and ethical questions that it will take years to think about to work on?

Dylan: Yeah, I think it’s closer to the second one. I’m sure that I don’t know the answers here. From my standpoint, I’m still trying to get a good grasp on what is possible in the one-robot-one-person case. I think that when you have … Yeah, when you … Oh man. I guess, it’s so hard to think about that problem because it’s just very unclear what’s even correct or right. Ethically, you want to be careful about imposing your beliefs and ideas too strongly on to a problem because you are shaping that.

At the same time, these are real challenges that are going to exist. We already see them in real life. If we look at the YouTube recommender stuff that was just happening, arguably, that’s a misspecified objective. To get a little bit of background here, this is largely based off of a recent New York Times opinion piece, it was looking at the recommendation engine for YouTube, and pointing out it has a bias towards recommending radical content. Either fake news or Islamist videos.

If you dig into why that was occurring, a lot of it is because… what are they doing? They’re optimizing for engagement. The process of online radicalization looks super engaging. Now, we can think about, where does that come up. Well, that issue gets introduced in a whole bunch of places. A big piece of it is that there is this adversarial dynamic to the world. There are users generating content in order to be outraging and enraging because they discovered that against more feedback and more responses. You need to design a system that’s robust to that strategic property of the world. At the same time, you can understand why YouTube was very, very hesitant to be taking actions that would like censorship.

Lucas: Right. I guess, just coming more often to this idea of the world having lots of adversarial agents in it, human beings are like general intelligences who have reached some level of corrigibility and obedience that works kind of well in the world amongst a bunch of other human beings. That was developed through evolution. Are there potentially techniques for developing the right sorts of  corrigibility and obedience in machine learning and AI systems through stages of evolution and running environments like that?

Dylan: I think that’s a possibility. I would say, one … I have a couple of thoughts related to that. The first one is I would actually challenge a little bit of your point of modeling people as general intelligences mainly in a sense that when we talk about artificial general intelligence, we have something in mind. It’s often a shorthand in these discussions for perfectly rational bayesian optimal actor.

Lucas: Right. Where that means? Just unpack that a little bit.

Dylan: What that means is a system that is taking advantage of all of the information that is currently available to it in order to pick actions that optimize expected utility. When we say perfectly, we mean a system that is doing that as well as possible. It’s that modeling assumption that I think sits at the heart of a lot of concerns about existential risk. I definitely think that’s a good model to consider, but there’s also the concern that might be misleading in some ways, and that it might not actually be a good model of people and how they act in general.

One way to look at it would be to say that there’s something about the incentive structure around humans and in our societies that is developed and adapted that creates the incentives for us to be corrigible. Thus, a good research goal of AI is to figure out what those incentives are and to replicate them in AI systems.

Another way to look at it is that people are intelligent, not necessarily in the ways that economics models us as intelligent that there are properties of our behavior, which are desirable properties that don’t directly derive from expected utility maximization; or if they do, they derive from a very, very diffuse form of expected utility maximization. This is the perspective that says that people on their own are not necessarily what human evolution is optimizing for, but people are a tool along that way.

We could make arguments for that based off of … I think it’s an interesting perspective to take. What I would say is that in order for societies to work, we have to cooperate. That cooperation was a crucial evolutionary bottleneck, if you will. One of the really, really important things that it did was it forced us to develop the parent-child strategy relationship equilibrium that we currently live in. That’s a process whereby we communicate our values, whereby we train people to think that certain things are okay or not, and where we inculcate certain behaviors in the next generation. I think it’s that process more than anything else that we really, really want in an AI system and in powerful AI systems.

Now, the thing is the … I guess, we’ll have to continue on that a little more. It’s really, really important that that’s there because if you don’t have those cognitive abilities to understand causing pain, and to just fundamentally decide that that’s a bad idea to have a desire to cooperate to buy into the different coordinations and normative mechanisms that human society uses. If you don’t have that, then you end up … Well, then society just doesn’t function. A hunter gatherer tribe of self-interested sociopaths probably doesn’t last for very long.

What this means is that our ability to coordinate our intelligence and cooperate with it was co-evolved and co-adapted alongside our intelligence. I think that that evolutionary pressure and bottleneck was really important to getting us to the type of intelligence that we are now. It’s not a pressure that AI is necessarily subjected to. I think, maybe that is one way to phrase the concern, I’d say.

When I look to evolutionary systems and where the incentives for corrigibility, and cooperation, and interaction come from, it’s largely about the processes whereby people are less like general intelligences in some ways. Evolution allowed us to become smart in some ways and restricted us in others based on the imperatives of group coordination and interaction. I think that a lot of our intelligence and practice is about reasoning about group interaction and what groups think is okay and not. That’s a part of the developmental process that we need to replicate in AI just as much as spatial reasoning or vision.

Lucas: Cool. I guess, I just want to touch base on this before we move on. Are there certain assumptions about the kinds of agents that humans are and almost, I guess, ideas about us as being utility maximizers in some sense that people you see commonly have but that are misconceptions about people and how people operate differently from AI?

Dylan: Well, I think that that’s the whole field of behavioral economics in a lot of ways. I could go up to examples of people being irrational. I think they’re all of the examples of people being more than just self-interested. There are ways in which we seem to be risk-seeking that seems like that would be irrational from an individual perspective, but you could argue with it may be rational from a group evolutionary perspective.

I mean, things like overeating. I mean, that’s not exactly the same type of rationality but it is an example of us becoming ill-adapted to our environments and showing the extent to which we’re not capable of changing or in which it may be hard to. Yeah, I think, in some ways, one story that I tell about AI risk is that back in the start of the AI field, we were looking around and saying, “We want to create something intelligent.” Intuitively, we all know what that means, but we need a formal characterization of it. The formal characterization that we turned to was the, basically, theories of rationality developed in economics.

Although those theories turned out to be, except in some settings, not great descriptors of human behavior, they were quite useful as a guide for building systems that accomplish goals. I think that part of what we need to do as a field is reassess where we’re going and think about whether or not building something like that perfectly rational actor is actually a desirable end goal. I mean, there’s a sense in which it is. I would like an all-powerful, perfectly aligned genie to help me do what I want in life.

You might think that if the odds of getting that wrong are too high, that maybe you would do better with shooting for something that doesn’t quite achieve that ultimate goal, but that you can get to with pretty high reliability. This may be a setting where shoot for the moon, and if you miss your land among the stars, it’s just a horribly misleading perspective.

Lucas: Shoot of the moon, and you might get a hellscape universe, but if you shoot for the clouds, it might end up pretty okay.

Dylan: Yeah. We could iterate on the sound bite, but I think something like that may not be … That’s where I stand on my thinking here.

Lucas: We’ve talked about a few different approaches that you’ve been working on over the past few years. What do you view as the main limitations of such approaches currently. Mostly, you’re just only thinking about one machine, one human systems or environments. What are the biggest obstacles that you’re facing right now in inferring and learning human preferences?

Dylan: Well, I think, the first thing is it’s just an incredibly difficult inference problem. It’s a really difficult inference problem to imagine running at scale with explicit inference mechanisms. One thing to do is you can design a system that explicitly tracks a belief about someone’s preferences, and then acts, and responds to that. Those are systems that you could try to prove theory about. They’re very hard to build. They can be difficult to get to make work correctly.

In contrast, you can create systems that it incentives to construct beliefs to accomplish their goals. It’s easier to imagine building those systems and having them work at scale, but it’s much, much hard to understand how you would be confident in those systems being well aligned.

I think that one of the biggest concerns I have, I mean, we’re still very far from many of these approaches being very practical to be honest. I think this theory is still pretty unfounded. There’s still a lot of work to go to understand, what is the target we’re even shooting for? What does an aligned system even mean? My colleagues and I have spent an incredible amount of time trying to just understand, what does it mean to be value-aligned if you are a suboptimal system.

There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system value-aligned?

Lucas: No, I would not.

Dylan: I think most people wouldn’t. Now, what if I told you that that program was actually implemented as a search that’s using the correct goal test? It actually turns out that if it’s within 10 steps of a winning play, it always finds that for you, but because of computational limitations, it usually doesn’t. Now, is the system value-aligned? I think it’s a little harder to tell here. What I do find is that when I tell people the story, and I start off with the search algorithm with the correct goal test, they almost always say that that is value-aligned but stupid.

There’s an interesting thing going on here, which is we’re not totally sure what the target we’re shooting for is. You can take this thought experiment and push it further. Supposed you’re doing that search, but, now, it says it’s heuristic search that uses the correct goal test but has an adversarially chosen heuristic function. Would that be a value-aligned system? Again, I’m not sure. If the heuristic was adversarially chosen, I’d say probably not. If the heuristic just happened to be bad, then I’m not sure.

Lucas: Could you potentially unpack what it means for something to be adversarially chosen?

Dylan: Sure. Adversarially chosen in this case just means that there is some intelligent agent selecting the heuristic function or that evaluation measurement in a way that’s designed to maximally screw you up. Adversarial analysis is a really common technique used in cryptography where we try to think of adversaries selecting inputs for computer systems that will cause them to malfunction. In this case, what this looks like is an adversarial algorithm that looks, at least, on the surface like it is trying to help you accomplish your objectives but is actually trying to fool you.

I’d say that, more generally, what this thought experiments helps me with is understanding that the value alignment is actually a quite tricky and subjective concept. It’s actually quite hard to nail down in practice what it would need.

Lucas: What sort of effort do you think needs to happen and from who in order to specify what it really means for a system to be value-aligned and to not just have a soft squishy idea of what that means but to have it really formally mapped out, so it can be implemented in machine systems?

Dylan: I think, we need more people working on technical AI safety research. I think to some extent it may always be something that’s a little ill-defined and squishy. Generally, I think it goes to the point of needing good people in AI willing to do this squishier less concrete work that really gets at it. I think value alignment is going to be something that’s a little bit more like I know it when I see it. As a field, we need to be moving towards a goal of AI systems where alignment is the end goal, whatever that means.

I’d like to move away from artificial intelligence where we think of intelligence as an ability to solve puzzles to artificial aligning agents where the goal is to build systems that are actually accomplishing goals on your behalf. I think the types of behaviors and strategies that arise from taking that perspective are qualitatively quite different from the strategies of pure puzzle solving on a well specified objective.

Lucas: All this work we’ve been discussing is largely at a theoretic and meta level. At this point, is this the main research that we should be doing, or is there any space for research into what specifically might be implementable today?

Dylan: I don’t think that’s the only work that needs to be done. For me, I think it’s a really important type of work that I’d like to see more off. I think a lot of important work is about understanding how to build these systems in practice and to think hard about designing AI systems with meaningful human oversight.

I’m a big believer in the idea that AI safety, that the distinction between short-term and long-term issue is not really that large, and that there are synergies between the research problems that go both directions. I believe that on the one hand, looking at short-term safety issues, which includes things like Uber’s car just killed someone, it includes YouTube recommendation engine, it includes issues like fake news and information filtering, I believe that all of those things are related to and give us are best window into the types of concerns and issues that may come up with advanced AI.

At the same time, and this is a point that I think people concerned about x-risks do themselves a disservice on by not focusing here. It’s that, actually, doing a theory about advanced AI systems and about in particular systems where it’s not possible to, what I would call, unilaterally intervene. Systems that aren’t corrigible by default. I think that that actually gives us a lot of idea of how to build systems now that are just merely hard to intervene with or oversee.

If you’re thinking about issues of monitoring and oversight, and how do you actually get a system that can appropriately evaluate when it should go to a person because its objectives are not properly specified or may not be relevant to the situation, I think YouTube would be in a much better place today if they have a robust system for doing that for their recommendation engine. In a lot of ways, the concerns about x-risks represent an extreme set of assumptions for getting AI right now.

Lucas: I think I’m also just trying to get a better sense of what the system looks like, and how it would be functioning on a day to day. What is the data that it’s taking in in order to capture, learn, and refer specific human preferences and values? Just trying to understand better whether or not it can model whole moral views and ethical systems of other agents, or if it’s just capturing little specific bits and pieces?

Dylan: I think my ideal would be to, as a system designer, build in as little as possible about my moral beliefs. I think that, ideally, the process would look something … Well, one process that I could see and imagine doing right would be to just directly go after trying to replicate something about the moral imprinting process that people have with their children. Either you had someone who’s like a guardian or is responsible for an AI system’s decision, and we build systems to try to align with one individual, and then try to adopt, and extend, and push forward the beliefs and preferences of that individual. I think that’s one concrete version that I could see.

I think a lot of the place where I see things maybe a little bit different than some people is that I think that the main ethical questions we’re going to be stuck with and the ones that we really need to get right are the mundane ones. The things that most people agree on and think are just, obviously, that’s not okay. Mundane ethics and morals rather than the more esoteric or fancier population ethics questions that can arise. I feel a lot more confident about the ability to build good AI systems if we get that part right. I feel like we’ve got a better shot at getting that part right because there’s a clearer target to shoot for.

Now, what kinds of data would you be looking at? In that case, it would be data from interaction with a couple of select individuals. Ideally, you’d want as much data as you can. What I think you really want to be careful of here is how much assumptions do you make about the procedure that’s generating your data.

What I mean by that is whenever you learn from data, you have to make some assumption about how that data relates to the right thing to do, where right is with like a capital R in this case. The more assumptions you make there, the more your systems would be able to learn about values and preferences, and the quicker it would be able to learn about values and preferences. But, the more assumptions and structure you make there, the more likely you are to get something wrong that your system won’t be able to recover from.

Again, we see this trade off come up of a challenge between a discrepancy between a discrepancy between the amount of uncertainty that you need in the system in order to be able to adapt to the right person and figure out the correct preferences and morals against the efficiency with which you can figure that out.

I guess, I mean, in saying this it feels a little bit like I’m rambling and unsure about what the answer looks like. I hope that that comes across because I’m really not sure. Beyond the rough structure of data generated from people, interpreted in a way that involves the fewest prior conceptions about what people want and what preferences people have that we can get away with is what I would shoot for. I don’t really know what that would look like in practice.

Lucas: Right. It seems here that it’s encroaching on a bunch of very difficult social, political, and ethical issues involving persons and data, which will be selected for preference aggregation, like how many people are included in developing the reward function and utility function of the AI system. Also, I guess, we have to be considering culturally-sensitive systems where systems operating in different cultures and contexts are going to be needed to be trained on different sets of data. I guess, it will also be questions and ethics about whether or not we’ll even want systems to be training off of certain culture’s data.

Dylan: Yeah. I would actually say that a good value … I wouldn’t necessarily even think of it as training off of different data. One of the core questions in artificial intelligence is identifying the relevant community that you are in and building a normative understanding of that community. I want to push back a little bit and move you away from the perspective of we collect data about a culture, and we figure out the values of that culture. Then, we build our system to be value-aligned with that culture.

The more we think about the actual AI product is the process whereby we determine, elicit, and respond to the normative values of the multiple overlapping communities that you find yourself in. That process is ongoing. It’s holistic, it’s overlapping, and it’s messy. To the extent that I think it’s possible, I’d like to not have a couple of people sitting around in a room deciding what the right values are. Much more, I think, a system should be holistically designed with value alignment at multiple scales as a core property of AI.

I think that that’s actually a fundamental property of human intelligence. You behave differently based on the different people around, and you’re very, very sensitive to that. There are certain things that are okay at work, that are not okay at home, that are okay on vacation, that are okay around kids, that are not. Figuring out what those things are and adapting yourself to them is the fundamental intelligence skill needed to interact in modern life. Otherwise, you just get shunned.

Lucas: It seems to me in the context of a really holistic, messy, ongoing value alignment procedure, we’ll be aligning AI systems ethics, and morals, and moral systems, and behavior with that of a variety of cultures, and persons, and just interactions in the 21st Century. When we reflect upon the humans of the past, we can see in various ways that they are just moral monsters. We have issues with slavery, and today we have issues with factory farming, and voting rights, and tons of other things in history.

How should we view and think about aligning powerful systems, ethics, and goals with the current human morality, and preferences, and the risk of amplifying current things which are immoral in present day life?

Dylan: This is the idea of mistakenly locking in the wrong values, in some sense. I think it is something we should be concerned about less from the standpoint of entire … Well, no, I think yes  from the standpoint of entire cultures getting things wrong. Again, I think if we don’t think of their being as monolithic society that has a single value set, these problems are fundamental issues. What your local community thinks is okay versus what other local communities think are okay.

A lot of our society and a lot of our political structures about how to handle those clashes between value systems. My ideal for AI systems is that they should become a part of that normative process, and maybe not participate in them as people, but, also, I think, if we think of value alignment as a consistent ongoing messy process, there is … I think maybe that perspective lends itself less towards locking in values and sticking with them. It’s one train, you can look at the problem, which is we determine what’s right and what’s wrong when we program our system to do that.

Then, there’s another one, which is we program our system to be sensitive to what people think is right or wrong. I think that’s more the direction that I think of value alignment in. Then, what I think the final part of what you’re getting at here is that the system actually will feed back into people. What AI system show us will shape what we think is okay and vice versa. That’s something that I am quite frankly not sure how to handle. I don’t know how you’re going to influence what someone wants, and what they will perceive that they want, and how to do that, I guess, correctly.

All I can say is that we do have a human notion of what is acceptable manipulation. We do have a human notion of allowing someone to figure out for themselves what they think is right and not and refraining from biasing them too far. To some extent, if you’re able to value align with communities in a good ongoing holistic manner, that should also give you some ways to choose and understand what types of manipulations you may be doing that are okay or not.

Also, say that I think that this perspective has a very mundane analogy when you think of the feedback cycle between recommendation engines and regular people. Those systems don’t model the effect … Well, they don’t explicitly model the fact that they’re changing the structure of what people want and what they’ll want in the future. That’s probably not the best analogy in the world.

I guess what I’m saying is that it’s hard to plan for how you’re going to influence someone’s desires in the future. It’s not clear to me what’s right or what’s wrong. What’s true is that we, as humans, have a lot of norms about what types of manipulation are okay or not. You might hope that appropriately doing value alignment in that way might help get to an answer here.

Lucas: I’m just trying to get a better sense here. What I’m thinking about the role that like ethics and intelligence plays here, I view intelligence as a means of modeling the world and achieving goals, and ethics as the end towards which intelligence is aimed here. Now, I’m curious in terms of behavior modeling where inverse reinforcement learning agents are modeling, I guess, the behavior of human agents and, also, predicting the sorts of behaviors that they’d be taking in the future or in the situation, which the inverse reinforcement learning agent finds itself.

I’m curious to know where metaethics and moral epistemology fits in, where inverse reinforcement learning agents are finding themselves a novel ethical situations, and what their ability to handle those novel ethical situations are like. When they’re handling those situations how much does it look like them performing some normative and metaethical calculus based on the kind of moral epistemology that they have, or how much does it look like they’re using some other behavioral predictive system where they’re like modeling humans?

Dylan: The answer to that question is not clear. What does it actually mean to make decisions based on ethical framework or metaethical framework? I guess, we could start there. You and I know what that means, but our definition is encumbered by the fact that it’s pretty human-centric. I think we talk about it in terms of, “Well, I weighed this option. I looked at that possibility.” We don’t even really mean the literal sense of weighed in actually counted up, and constructed actual numbers, and multiplied them together in our heads.

What these are is they’re actually references to complex thought patterns that we’re going through. They’re fine whether or not those thought patterns are going on. The AI system, you can also talk about the difference between the process of making a decision and the substance of it. When an inverse reinforcement learning agent is going out into the world, the policy it’s following is constructed to try to optimize a set of inferred preferences, but does that means that the policy you’re outputting is making metaethical characterizations?

Well, the moment, almost certainly not because the systems we build are just not capable of that type of cognitive reasoning. I think the bigger question is, do you care? To some extent, you probably do.

Lucas: I mean, I’d care if I had some very deep disagreements with the metaethics that led to the preferences that were loaned and loaded to the machine. Also, if the machine were in such a new novel ethical situation that was unlike anything human beings had faced that just required some metaethical reasoning to deal with.

Dylan: Yes. I mean, I think you definitely wanted to take decisions that you would agree with or, at least, that you could be non-maliciously convinced to agree with. Practically, there isn’t a place in the theory where that shows up. It’s not clear that what you’re saying is that different from value alignment in particular. If I were to try to refine the point about metaethics, what it sounds to me like you’re getting at is an inductive bias that you’re looking for in the AI systems.

Arguably, ethics is about an argument of what inductive bias should we have as humans. I don’t think that that’s a first order of property in value alignment systems necessarily or in preference-based learning systems in particular. I would think that that kind of meta ethics, I think, comes in from value aligning to someone that has these sophisticated ethical ideas.

I don’t know where your thoughts about metaethics came from, but, at least, indirectly, we can probably trace them down to the values that your parents inculcated in you as a child. That’s how we build met ethics into your head if we want to think of you as being an AGI. I think that for AI systems, that’s the same way that I would see it being in there. I don’t believe the brain has circuits dedicated to metaethics. I think that exists in software, and in particular, something that’s being programmed into humans from their observational data, more so than from the structures that are built into us as a fundamental part of our intelligence or value alignment.

Lucas: We’ve also talked a bit about how human beings are potentially not fully rational agents. With inverse reinforcement learning, this leaves open the question as to whether or not AI systems are actually capturing what the human being actually prefers, or if there’s some limitations in the humans’ observed or chosen behavior, or explicitly told preferences like limits in that ability to convey what we actually most deeply value or would value given more information. These inverse reinforcement learning systems may not be learning what we actually value or what we think we should value.

How can AI systems assist in this evolution of human morality and preferences whereby we’re actually conveying what we actually value and what we would value given more information?

Dylan: Well, there are certainly two things that I heard in that question. One is, how do you just mathematically account for the fact that people are irrational, and that that is a property of the source of your data? Inverse reinforcement learning, at face value, doesn’t allow us to model that appropriately. It may lead us to make the wrong inferences. I think that’s a very interesting question. It’s probably the main one that I think about now as a technical problem is understanding, what are good ways to model how people might or might not be rational, and building systems that can appropriately interact with that complex data source.

One recent thing that I’ve been thinking about is, what happens if people, rather than knowing their objective, what they’re trying to accomplish, are figuring it out over time? This is the model where the person is a learning agent that discovers how they like states when they enter them, rather than thinking of the person as an agent that already knows what they want, and they’re just planning to accomplish that. I think these types of assumptions that try to paint a very, very broad picture of the space of things that people are doing can help us in that vein.

When someone is learning, it’s actually interesting that you can actually end up helping them. You end up with classic strategies that looks like it breaks down into three phases. You have initial exploration phase where you help the learning agent to get a better picture of the world, and the dynamics, and its associated rewards.

Then, you have another observation phase where you observe how that agent, now, takes advantage of the information that it’s got. Then, there’s an exploitation or extrapolation phase where you try to implement the optimal policy given the information you’ve seen so far. I think, moving towards more complex models that have a more realistic setting and richer set of assumptions behind them is important.

The other thing you talked about was about helping people discover their morality and learn more what’s okay and what’s not. There, I’m afraid I don’t have too much interesting to say in the sense that I believe it’s an important question, but I just don’t feel that I have many answers there.

Practically, if you have someone who’s learning their preferences over time, is that different than humans refining their moral theories? I don’t know. You could make mathematical modeling choices, so that they are. I’m not sure if that really gets at what you’re trying to point towards. I’m sorry that I don’t have anything more interesting to say on that front other than, I think, it’s important, and I would love to talk to more people who are spending their days thinking about that question because I think it really does deserve that kind of intellectual effort.

Lucas: Yeah, yeah. It sounds like we need some more AI moral psychologists to help us think about these things.

Dylan: Yeah. In particular, when talking about philosophy around value alignments and the ethics of value alignment, I think a really important question is, what are the ethics of developing value alignment systems? A lot of times, people talk about AI ethics from the standpoint of, for a lack of a better example, the trolley problem. The way they think about it is, who should the car kill? There is a correct answer or maybe not a correct answer, but there are answers that we could think of as more or less bad. AI, which one of those options should the AI select? That’s not unimportant, but it’s not the ethical question that an AI system designer is faced with.

In my mind, if you’re designing a self-driving car, the relevant questions you should be asking are two things: One, what do I think is an okay way to respond to different situations? Two, how is my system going to be understanding the preferences of the people involved in those situations? Then, three, how should I design my system in light of those two facts?

I have my own preferences about what I would like my system to do. I have an ethical responsibility, I would say, to make sure that my system is adapting to the preferences of its users to the extent that it can. I also wonder to what extent. How should you handle things when there are conflicts between those two value sets?

You’re building a robot. It’s going to go and live with an uncontacted human tribe. Should it respect the local cultural traditions and customs? Probably. That would be respecting the values of the users. Then, let’s say that that tribe does something that we would consider to be gross like pedophilia. Is my system required to participate wholesale in that value system? Where is the line that we would need to draw between unfairly imposing my values on system users and being able to make sure that the technology that I build isn’t used for purposes that I would deem reprehensible or gross?

Lucas: Maybe we should just put a dial in each of the autonomous cars that lets the user set it to deontology mode or utilitarianism mode as its racing down the highway. Yeah, I think this is the … I guess, an important role. I just think that metaethics is super important. I’m not sure if this is necessarily the case, but if fully autonomous systems are going to play a role where they’re resolving these ethical dilemmas for us, which I guess at some point eventually, if they’re going to be really actually autonomous and help to make the world a much better place seems necessary.

I guess, this feeds into my next question where I’m wondering where we probably both have different assumptions about this, but what the role of inverse reinforcement learning is ultimately? Is it just to allow AI system to evolve alongside us and to match current ethics or is it to allow the systems to ultimately surpass us and move far beyond us into the deep future?

Dylan: Inverse reinforcement learning, I think, is much more about the first and the second. I think it can be a part of how you get to the second and how you improve. For me, when I think about these problems technically, I try to think about matching human morality as the goal.

Lucas: Except for the factory farming and stuff.

Dylan: Well, I mean, if you had a choice between, thinks that eradicating all humans is okay and against farming versus neutral about factory farming and thinks that are eradicating all humans aren’t okay, which would you pick? I mean, I guess, with your audience that there are maybe some people that would choose the saving the animals answer.

My point is that, I think, it’s so hard for me. Technically, I think it’s very hard to imagine getting these normative aspects of human societies and interaction right. I think, just hoping to participate in that process in a way that is analogous to how people do normally is a good step. I think we probably, to the extent that we can, should probably not have AI systems trying to figure out if it’s okay to do factory farming and to the extent that we can …

I think that it’s so hard to understand what it means to even match human morality or participate in it that, for me, the concept of surpassing, it feels very, very challenging and fraught. I would worry, as a general concern, that as a system designer who doesn’t necessarily represent the views and interest of everyone, that by programming in surpassing humanity or surpassing human preferences or morals, what I’m actually doing is just programming in my morals and ethical beliefs.

Lucas: Yes. I mean, there seems to be this strange issue here where it seems like if we get AGI, and recursive self-improvement is a thing that really takes it off, so that we have a system who has potentially succeeded in its inverse reinforcement learning, but far surpassed human beings and its general intelligence. We have a superintelligence that’s matching human morality. It just seems like a funny situation where we’d really have to pull the brakes. I guess, as William MacAskill mentions have a really, really long deliberation about ethics, and moral epistemology, and value. How do you view that?

Dylan: I think that’s right. I mean, I think there are some real questions about who should be involved in that conversation. For instance, I actually even think it’s … Well, one thing I’d say is that you should recognize that there’s a difference between having the same morality and having the same data. One way to think about it is that people who are against factory farming have a different morality than the rest of the people.

Another one is that they actually just have exposure to the information that allows their morality to come to a better answer. There’s this confusion you can make between the objective that someone has and the data that they’ve seen so far. I think, one point would be to think that a system that has current human morality but access to a vast, vast wealth of information may actually do much better than you might think. I think, we should leave that open as a possibility.

For me, this is less about morality in particular, and more just about power concentration, and how much influence you have over the world. I mean, if we imagine that there was something like a very powerful AI system that was controlled by a small number of people, yeah, you better think freaking hard before you tell that system what to do. That’s related to questions about ethical ramifications on metaethics, and generalization, and what we actually truly value as humans. What is also super true for all of the more mundane things in the day to day as well. Did that make sense?

Lucas: Yeah, yeah. It totally makes sense. I’m becoming increasingly mindful of your time here. I just wanted to hit a few more questions if that’s okay before I let you go.

Dylan: Please, yeah.

Lucas: Yeah. I’m wondering, would you like to, or do you have any thoughts on how coherent extrapolated volition fits into this conversation and your views on it?

Dylan: What I’d say is I think coherent extrapolated volition is an interesting idea and goal.

Lucas: Where it is defined as?

Dylan: Where it’s defined as a method of preference aggregation. Personally, I’m a little weary of preference aggregation approaches. Well, I’m weary of imposing your morals on someone indirectly via choosing the method of preference aggregation that we’re going to use. I would-

Lucas: Right, but it seems like, at some point, we have to make some metaethical decision, or else, we’ll just forever be lost.

Dylan: Do we have to?

Lucas: Well, some agent does.

Dylan: My-

Lucas: Go ahead.

Dylan: Well, does one agent have to? Did one agent decide on the ways that we were going to do preference aggregation as a society?

Lucas: No. It naturally evolved out of-

Dylan: It just naturally evolved via a coordination and argumentative process. For me, my answer to … If you force me to specify something about how we’re going to do value aggregation, if I was controlling the values for an AGI system, I would try to say as little as possible about the way that we’re going to aggregate values because I think we don’t actually understand that process much in humans.

Lucas: Right. That’s fair.

Dylan: Instead, I would opt for a heuristic of to the extent that we can devote equal optimization effort towards every individual, and allow that parliament, if you will, to determine the way the value should be aggregated. This doesn’t necessarily mean having an explicit value aggregation mechanism that gets set in stone. This could be an argumentative process mediated by artificial agents arguing on your behalf. This could be futuristic AI-enabled version of the court system.

Lucas: It’s like an ecosystem of preferences and values in conversation?

Dylan: Exactly.

Lucas: Cool. We’ve talked a little bit about the deep future here now with where we’re reaching around potentially like AGI or artificial superintelligence. After, I guess, inverse reinforcement learning is potentially solved, is there anything that you view that comes after inverse reinforcement learning in these techniques?

Dylan: Yeah. I mean, I think inverse reinforcement learning is certainly not the be-all, end-all. I think what it is, is it’s one of the earliest examples in AI of trying to really look at preference solicitation, and modeling preferences, and learning preferences. It existed in a whole bunch of … economists have been thinking about this for a while already. Basically, yeah, I think there’s a lot to be said about how you model data and how you learn about preferences and goals. I think inverse reinforcement learning is basically the first attempt to get at that, but it’s very far from the end.

I would say the biggest thing in how I view things that is maybe different from your standard reinforcement learning, inverse reinforcement learning perspective is that I focus a lot on, how do you act given what you’ve learned from inverse reinforcement learning. Inverse reinforcement learning is a pure inference problem. It’s just figure out what someone wants. I ground that out in all of our research in take actions to help someone, which introduces a new set of concerns and questions.

Lucas: Great. It looks like we’re about at the end of the hour here. I guess, if anyone here is interested in working on this technical portion of the AI alignment problem, what do you suggest they study or how do you view that it’s best for them to get involved, especially if they want to work on inverse reinforcement learning and inferring human preferences?

Dylan: I think if you’re an interested person, and you want to get into technical safety work, the first thing you should do is probably read Jan Leike’s recent write up in 80,000 Hours. Generally, what I would say is, try to get involved in AI research flat. Don’t focus as much on trying to get into AI safety research, and just generally focus more on acquiring the skills that will support you in doing good AI research. Get a strong math background. Get a research advisor who will advise you on doing research projects, and help teach you the process of submitting papers, and figuring out what the AI research community is going to be interested in.

In my experience, one of the biggest pitfalls that early researchers make is focusing too much on what they’re researching rather than thinking about who they’re researching with, and how they’re going to learn the skills that will support doing research in the future. I think that most people don’t appreciate how transferable research skills are to the extent that you can try to do research on technical AI safety, but more work on technical AI. If you’re interested in safety, the safety connections will be there. You may see how a new area of AI actually relates to it, supports it, or you may find places of new risks, and be in a good position to try to mitigate that and take steps to alleviate those harms.

Lucas: Wonderful. Yeah, thank you so much for speaking with me today, Dylan. It’s really been a pleasure, and it’s been super interesting.

Dylan: It was a pleasure talking to you. I love the chance to have these types of discussions.

Lucas: Great. Thanks so much. Until next time.

Dylan: Until next time. Thanks a blast.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back soon with another episode in this new AI alignment series.

[end of recorded material]

Podcast: Navigating AI Safety – From Malicious Use to Accidents

Is the malicious use of artificial intelligence inevitable? If the history of technological progress has taught us anything, it’s that every “beneficial” technological breakthrough can be used to cause harm. How can we keep bad actors from using otherwise beneficial AI technology to hurt others? How can we ensure that AI technology is designed thoughtfully to prevent accidental harm or misuse?

On this month’s podcast, Ariel spoke with FLI co-founder Victoria Krakovna and Shahar Avin from the Center for the Study of Existential Risk (CSER). They talk about CSER’s recent report on forecasting, preventing, and mitigating the malicious uses of AI, along with the many efforts to ensure safe and beneficial AI.

Topics discussed in this episode include:

  • the Facebook Cambridge Analytica scandal,
  • Goodhart’s Law with AI systems,
  • spear phishing with machine learning algorithms,
  • why it’s so easy to fool ML systems,
  • and why developing AI is still worth it in the end.
In this interview we discuss The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation, the original FLI grants, and the RFP examples for the 2018 round of FLI grants. This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: The challenge is daunting and the stakes are high. So ends the executive summary of the recent report, The Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation. I’m Ariel Conn with the Future of Life Institute, and I’m excited to have Shahar Avin and Victoria Krakovna joining me today to talk about this report along with the current state of AI safety research and where we’ve come in the last three years.

But first, if you’ve been enjoying our podcast, please make sure you’ve subscribed to this channel on SoundCloud, iTunes, or whatever your favorite podcast platform happens to be. In addition to the monthly podcast I’ve been recording, Lucas Perry will also be creating a new podcast series that will focus on AI safety and AI alignment, where he will be interviewing technical and non-technical experts from a wide variety of domains. His upcoming interview is with Dylan Hadfield-Menell, a technical AI researcher who works on cooperative inverse reinforcement learning and inferring human preferences. The best way to keep up with new content is by subscribing. And now, back to our interview with Shahar and Victoria.

Shahar is a Research Associate at the Center for the Study of Existential Risk, which I’ll be referring to as CSER for the rest of this podcast, and he is also the lead co-author on the Malicious Use of Artificial Intelligence report. Victoria is a co-founder of the Future of Life Institute and she’s a research scientist at DeepMind working on technical AI safety.

Victoria and Shahar, thank you so much for joining me today.

Shahar: Thank you for having us.

Victoria: Excited to be here.

Ariel: So I want to go back three years, to when FLI started our grant program, which helped fund this report on the malicious use of artificial intelligence, and I was hoping you could both talk for maybe just a minute or two about what the state of AI safety research was three years ago, and what prompted FLI to take on a lot of these grant research issues — essentially what prompted a lot of the research that we’re seeing today? Victoria, maybe it makes sense to start with you quickly on that.

Victoria: Well three years ago, AI safety was less mainstream in the AI research community than it is today, particularly long-term AI safety. So part of what FLI has been working on and why FLI started this grant program was to stimulate more work into AI safety and especially its longer-term aspects that have to do with powerful general intelligence, and to make it a more mainstream topic in the AI research field.

Three years ago, there were fewer people working in it, and many of the people who were working in it were a little bit disconnected from the rest of the AI research community. So part of what we were aiming for with our Puerto Rico conference and our grant program, was to connect these communities better, and to make sure that this kind of research actually happens and that the conversation shifts from just talking about AI risks in the abstract to actually doing technical work, and making sure that the technical problems get solved and that we start working on these problems well in advance before it is clear that, let’s say general AI, would appear soon.

I think part of the idea with the grant program originally, was also to bring in new researchers into AI safety and long-term AI safety. So to get people in the AI community interested in working on these problems, and for those people whose research was already related to the area, to focus more on the safety aspects of their research.

Ariel: I’m going to want to come back to that idea and how far we’ve come in the last three years, but before we do that, Shahar, I want to ask you a bit about the report itself.

So this started as a workshop that Victoria had also actually participated in last year and then you’ve turned it into this report. I want you to talk about what prompted that and also this idea that’s mentioned in the report is that, no one’s really looking at how artificial intelligence could be used maliciously. And yet what we’ve seen with every technology and advance that’s happened throughout history, I can’t think of anything that people haven’t at least attempted to use to cause harm, whether they’ve succeeded or not, I don’t know if that’s always the case, but almost everything gets used for harm in some way. So I’m curious why there haven’t been more people considering this issue yet?

Shahar: So going to back to maybe a few months before the workshop, which as you said was February 2017. Both Miles Brundage at the Future of Humanity Institute and I at the Center for the Study of Existential Risk, had this inkling that there were more and more corners of malicious use of AI that were being researched, people were getting quite concerned. We were in discussions with the Electronic Frontier Foundation about the DARPA Cyber Grand Challenge and progress being made towards the use of artificial intelligence in offensive cybersecurity. I think Miles was very well connected to the circle who were looking at lethal autonomous weapon systems and the increasing use of autonomy in drones. And we were both kind of — stories like the Facebook story that has been in the news recently, there were kind of the early versions of that coming up already back then.

So it’s not that people were not looking at malicious uses of AI, but it seemed to us that there wasn’t this overarching perspective that is not looking at particular domains. This is not, “what will AI do to cybersecurity in terms of malicious use? What will malicious use of AI look like in politics? What do malicious use of AI look like in warfare?” But rather across the board, if you look at this technology, what new kinds of malicious actions does it enable, and other commonalities across those different domains. Plus, it seemed that that “across the board” more technology-focused perspective, other than “domain of application” perspective, was something that was missing. And maybe that’s less surprising, right? People get very tied down to a particular scenario, a particular domain that they have expertise on, and from the technologists’ side, many of them just wouldn’t know all of the legal minutiae of warfare, or — one thing that we found was there weren’t enough channels of communication between the cybersecurity community and the AI research community; similarly the political scientists and the AI research community. So it did require quite an interdisciplinary workshop to get all of these things on the table, and tease out some the commonalities, which is what we then try to do with the report.

Ariel: So actually, you mentioned the Facebook thing and I was a little bit curious about that. Does that fall under the umbrella of this report or is that a separate issue?

Shahar: It’s not clear if it would fall directly under the report, because the way we define malicious could be seen as problematic. It’s the best that we could do with this kind of report, which is to say that there is a deliberate attempt to cause harm using the technology. It’s not clear, whether in the Facebook case, there was a deliberate attempt to cause harm or whether there was disregard of harm that could be caused as a side effect, or just the use of this in an arena that there are legitimate moves, just some people realize that the technology can be used to gain an upper hand within this arena.

But, there are whole scenarios that sit just next to it, that look very similar, but that are centralized use of this kind of surveillance, diminishing privacy, potentially the use of AI to manipulate individuals, manipulate their behavior, target messaging at particular individuals.

There are clearly imaginable scenarios in which this is done maliciously to keep a corrupt government in power, to overturn a government in another nation, kind of overriding the self-determination of the members of their country. There are not going to be clear rules about what is obviously malicious and what is just part of the game. I don’t know where to put Facebook’s and Cambridge Analytica’s case, but there are clearly cases that I think universally would be considered as malicious that from the technology side look very similar.

Ariel: So this gets into a quick definition that I would like you to give us and that is for the term ‘dual use.’ I was at a conference somewhat recently and a government official who was there, not a high level, but someone who should have been familiar with the term ‘dual use’ was not. So I would like to make sure that we all know what that means.

Shahar: So I’m not, of course, a legal expert, but the term did come up a lot in the workshop and in the report. ‘Dual use,’ as far as I can understand it, refers to technologies or materials that both have peace-time or peaceful purposes and uses, but also wartime, or harmful uses. A classical example would be certain kinds of fertilizer that could be used to grow more crops, but could also be used to make homegrown explosives. And this matters because you might want to regulate explosives, but you definitely don’t want to limit people’s access to get fertilizer and so you’re in a bind. How do you make sure that people who have a legitimate peaceful use of a particular technology or material get to have that access without too much hassle that will increase the cost or make things more burdensome, but at the same time, make sure that malicious actors don’t get access to capabilities or technologies or materials that they can use to do harm.

I’ve also heard the term ‘omni use,’ being referred to artificial intelligence, this is the idea that technology can have so many uses across the board that regulating it because of its potential for causing harm comes at a very, very high price, because it is so foundational for so many other things. So one can think of electricity: it is true that you can use electricity to harm people, but vetting every user of the electric grid before they are allowed to consume electricity, seems very extreme, because there is so much benefit to be gained from just having access to electricity as a utility, that you need to find other ways to regulate. Computing is often considered as ‘omni use’ and it may well be that artificial intelligence is such a technology that would just be foundational for so many applications that it will be ‘omni use,’ and so the way to stop malicious actors from having access to it is going to be fairly complicated, but it’s probably not going to be any kind of a heavy-handed regulation.

Ariel: Okay. Thank you. So going back a little bit to the report more specifically, I don’t know how detailed we want to get with everything, but I was hoping you could touch a little bit on a few of the big topics that are in the report. For example, you talk about changes in the landscape of threats, where there is an expansion of existing threats, there’s an intro to new threats, and typical threats will be modified. Can you speak somewhat briefly as to what each of those mean?

Shahar: So I guess what I was saying, the biggest change is that machine learning, at least in some domains, now works. That means that you don’t need to have someone write out the code in order to have a computer that is performant at the particular task, if you can have the right kind of labeled data or the right kind of simulator in which you can train an algorithm to perform that action. That means that, for example, if there is a human expert with a lot of tacit knowledge in a particular domain, let’s say the use of a sniper rifle, it may be possible to train a camera that sits on top of a rifle, coupled with a machine learning algorithm that does the targeting for you, so that now any soldier becomes as expert as an expert marksman. And of course, the moment you’ve trained this model once, making copies of it is essentially free or very close to free, the same as it is with software.

Another is the ability to go through very large spaces of options and using some heuristics to more effectively search through that space for effective solutions. So one example of that would be AlphaGo, which is a great technological achievement and has absolutely no malicious use aspects, but you can imagine as an analogy, similar kinds of technologies being used to find weaknesses in software, discovering vulnerabilities and so on. And I guess, finally, one example we’ve seen that came up a lot, is the capabilities in machine vision. The fact that you can now look at an image and tell what is in that image, through training, which is something that computers were just not able to do a decade ago, at least nowhere near human levels of performance, starts unlocking potential threats both in autonomous targeting, say on top of drones, but also in manipulation. If I can know whether a picture is a good representation of something or not, then my ability to create forgeries significantly increases. This is the technology of generative adversarial networks, that we’ve seen used to create fake audio and potentially fake videos in the near future.

All of these new capabilities, plus the fact that access to the technology is becoming — I mean these technologies are very democratized at the moment. There are papers on arXiv, there are good tutorials on You Tube. People are very keen to have more people join the AI revolution, and for good reason, plus the fact that moving these trained models around is very cheap. It’s just the cost of copying the software around, and the computer that is required to run those models is widely available. This suggests that the availability of these malicious capabilities is going to rapidly increase, and that the ability to perform certain kinds of attacks would no longer be limited to a few humans, but would become much more widespread.

Ariel: And so I have one more question for you, Shahar, and then I’m going to bring Victoria back in. You’re talking about the new threats, and this expansion of threats and one of the things that I saw in the report that I’ve also seen in other issues related to AI is, we’ve had computers around for a couple decades now, we’re used to issues pertaining to phishing or hacking or spam. We recognize computer vulnerabilities. We know these are an issue. We know that there’s lots of companies that are trying to help us defend our computers against malicious cyber attacks, stuff like that. But one of the things that you get into in the report is this idea of “human vulnerabilities” — that these attacks are no longer just against the computers, but they are also going to be against us.

Shahar: I think for many people, this has been one of the really worrying things about the Cambridge Analytica, Facebook issue that is in the news. It’s the idea that because of our particular psychological tendencies, because of who we are, because of how we consume information, and how that information shapes what we like and what we don’t like, what we are likely to do and what we are unlikely to do, the ability of the people who control the information that we get, gives them some capability to control us. And this is not new, right?

People who are making newspapers or running radio stations or national TV stations, have known for a very long time, that the ability to shape the message is the ability to influence people’s decisions. But coupling that with algorithms that are able to run experiments on millions or billions of people simultaneously with very tight feedback loops — so you make a small change in the feed of one individual and see whether their behavior changes. And you can run many of these experiments and you can get very good data, is something that was never available at the age of broadcasts. To some extent, it was available in the age of software. When software starts moving into big data and big data analytics, the boundaries start to blur between those kinds of technologies and AI technologies.

This is the kind of manipulation that you seem to be asking about that we definitely flag in the report, both in terms of political security, the ability of large communities to govern themselves in a way that they find to truthfully represent their own preferences, but also, on a more small scale, with the social side of cyber attacks. So, if I can manipulate an individual, or a few individuals in a company to disclose their passwords or to download or click a link that they shouldn’t have, through modeling of their preferences and their desires, then that is a way in that might be a lot easier than trying to break the system through its computers.

Ariel: Okay, so one other thing that I think I saw come up, and I started to allude to this — there’s, like I said, the idea that we can defend our computers against attacks and we can upgrade our software to fix vulnerabilities, but then how do we sort of “upgrade” people to defend themselves? Is that possible? Or is it a case of we just keep trying to develop new software to help protect people?

Shahar: I think the answer is both. One thing that did come up a lot is, unfortunately unlike computers, you cannot just download a patch to everyone’s psychology. We have slow processes of doing that. So we can incorporate parts of what is a trusted computer, what is a trusted source, into the education system and get people to be more aware of the risks. You can definitely design the technology such that it makes a lot more explicit where it’s vulnerabilities and where it’s more trusted parts are, which is something that we don’t do very well at the moment. The little lock on the browser is kind of the high end of our ability to design systems to disclose where security is and why it matters, and there is much more to be done here, because just awareness of the amount of vulnerability is very low.

So there is some more probably that we can do with education and with notifying the public, but it also should be expected that this ability is limited, and it’s also, to a large extent, an unfair burden to put on the population at large. It is much more important, I think, that the technology is being designed in the first place, to as much as possible be explicit and transparent about its levels of security, and if those levels of security are not high enough, then that in turn should lead for demands for more secure systems.

Ariel: So one of the things that came up in the report that I found rather disconcerting, was this idea of spear phishing. So can you explain what that is?

Shahar: We are familiar with phishing in general, which is when you pretend to be someone or something that you’re not in order to gain your victim’s trust and get them to disclose information that they should not be disclosing to you as a malicious actor. So you could pretend to be the bank and ask them to put in their username and password, and now you have access to their bank account and can transfer away their funds. If this is part of a much larger campaign, you could just pretend to be their friend, or their secretary, or someone who wants to give them a prize, get them to trust you, get one of the passwords that maybe they are using, and maybe all you do with that is you use that trust to talk to someone else who is much more concerned. So now that I have the username and password, say for the email or the Facebook account of some low-ranking employee in a company, I can start messaging their boss and pretending to be them and maybe get even more passwords and more access through that.

Phishing is usually kind of a “spray and pray” approach. You have a, “I’m a Nigerian prince, I have all of this money stocked in Africa, I’ll give you a cut if you help me move it out of the country, you need to send me some money.” You send this to millions of people, and maybe one or two fall for it. The cost for the sender is not very high, but the success rate is also very, very low.

Spear phishing on the other hand, is when you find a particular target, and you spend quite a lot of time profiling them and understanding what their interests are, what their social circles are, and then you craft a message that is very likely to work on them, because it plays to their ego, it plays to their normal routine, it plays on their interests and so on.

In the report we talk about this research by ZeroFOX, where they took a very simple version of this. They said, let’s look at what people tweet about, we’ll take that as an indication of the stuff that they’re interested in. We will train a machine learning algorithm to create a model of the topics that people are interested in, form the tweets, craft a malicious tweet that is based on those topics of interest and have that be a link to a malicious site. So instead of sending kind of generally, “Check this out, super cool website,” with a link to a malicious website most people know not to click on, it will be, “Oh, you are clearly interested in sports in this particular country, have you seen what happened, like the new hire in this team?” Or, “You’re interested in archeology, crazy new report about recent finds in the pyramids,” or something. And what they showed was that, once that they’ve kind of created the bot, that bot then crafted targeted messages, those spear phishing messages, to a large number of users, and in principle they could scale it up indefinitely because now it’s software, and the click through rate was very high. I think it was something like 30 percent, which is orders of magnitude more than you get with phishing.

So automating spear phishing changes what used to be a trade off between spray and pray, target millions of people, but very few of them would click on it, or spear phishing where you target only a few individuals with very high success rates — now you can target millions of people and customize the message to each one so you have high success rates for all of them. Which means that, you and me, who previously wouldn’t be very high on the target list for cyber criminals or other cyber attackers can now become targets simply because the cost is very low.

Ariel: So the cost is low, I don’t think I’m the only person who likes to think that I’m pretty good at recognizing sort of these phishing scams and stuff like that. I’m assuming these are going to also become harder for us to identify?

Shahar: Yep. So the idea is that the moment you have access to people’s data, because they’re explicit on social media about their interests and about their circles of friends, then the better you get at crafting messages and, say, comparing them to authentic messages from people, and saying, “oh this is not quite right, we are going to tweak the algorithm until we get something that looks a lot like something a human would write.” Quite quickly you could get to the point where computers are generating, say, to begin with texts that are indistinguishable from what a human would write, but increasingly also images, audio segments, maybe entire websites. As long as the motivation or the potential for profit is there, it seems like the technology, either the ones that we have now or the ones that we can foresee in the five years, would allow these kinds of advances to take place.

Ariel: Okay. So I want to touch quickly on the idea of adversarial examples. There was an XKCD cartoon that came out a week or two ago about self driving cars and the character says, “I worry about self driving car safety features, what’s to stop someone from painting fake lines on the road or dropping a cutout of a pedestrian onto a highway to make cars swerve and crash,” and then realizes all of those things would also work on human drivers. Sort of a personal story, I used to live on a street called Climax and I actually lived at the top of Climax, and I have never seen a street sign stolen more in my life, it was often the street sign just wasn’t there. So my guess is it’s not that hard to steal a stop sign if someone really wanted to mess around with drivers, and yet we don’t see that happen very often.

So I was hoping both of you could weigh in a little bit on what you think artificial intelligence is going to change about these types of scenarios where it seems like the risk will be higher for things like adversarial examples versus just stealing a stop sign.

Victoria: I agree that there is certainly a reason for optimism in the fact that most people just aren’t going to mess with the technology, that there aren’t that many actual bad actors out there who want to mess it up. On the other hand, as Shahar said earlier, democratizing both the technology and the ways to mess with it, to interfere with it, does make that more likely. For example, the ways in which you could provide adversarial examples to cars, can be quite a bit more subtle than stealing a stop sign or dropping a fake body on the road or anything like that. For example, you can put patches on a stop sign that look like noise or just look like rectangles in certain places and humans might not even think to remove them, because to humans they’re not a problem. But an autonomous car might interpret that as a speed limit sign instead of a stop sign, and similarly, more generally people can use adversarial patches to fool various vision systems, for example if they don’t want to be identified by a surveillance camera or something like that.

So a lot of these methods, people can just read about it online, there are papers in arXiv and I think the fact that they are so widely available might make it easier for people to interfere with technology more, and basically might make this happen more often. It’s also the case that the vulnerabilities of AI are different than the vulnerabilities of humans, so it might lead to different ways that it can fail that humans are not used to, and ways in which humans would not fail. So all of these things need to be considered, and of course, as technologists, we need to think about ways in which things can go wrong, whether it is presently highly likely, or not.

Ariel: So that leads to another question that I want to ask, but before I go there, Shahar, was there anything you wanted to add?

Shahar: I think that covers almost all of the basics, but I’d maybe stress a couple of these points. One thing about machines failing in ways that are different from how humans fail, it means that you can craft an attack that would only mess up a self driving car, but wouldn’t mess up a human driver. And that means let’s say, you can go in the middle of the night and put some stickers on and you are long gone from the scene by the time something bad happens. So this diminished ability to attribute the attack, might be something that means that more people feel like they can get away with it.

Another one is that we see people much more willing to perform malicious or borderline acts online. So it’s important, I mean we often talk about adversarial examples as things that affect vision systems, because that’s where a lot of the literature is, but it is very likely — in fact, there are several examples that also things like anomaly detection that uses machine learning patterns, malicious code detection that is based on machine-learned patterns, anomaly detection in networks and so on, all of these have their kinds of adversarial examples as well.  And so thinking about adversarial examples against defensive systems and adversarial examples against systems that are only available online, brings us back to one attacker somewhere in the world could have access to your system and so the fact that most people are not attackers doesn’t really help you defense-wise.

Ariel: And, so this whole report is about how AI can be misused, but obviously the AI safety community and AI safety research goes far beyond that. So especially in the short term, do you see misuse or just general safety and design issues to be a bigger deal?

Victoria: I think it is quite difficult to say which of them would be a bigger deal. I think both misuse and accidents are something that are going to increase in importance and become more challenging and these are things that we really need to be working on as a research community.

Shahar: Yeah, I agree. We wrote this report not because we don’t think accident risk and safety risk matters are important — we think they are very important. We just thought that there was some pretty good technical reports out there outlining the risks from accident with near-term machine learning and with long-term and some of the researching that could be used to address them, and we felt like a similar thing was missing for misuse, which was why we wrote that report.

Both are going to be very important, and to some extent there is going to be an interplay. It is possible that systems that are more interpretable are also easier to secure. It might be the case that if there is some restriction in the diffusion of capabilities that also means that there is less incentive to cut corners to out-compete someone else by skimping on safety and so on. So there are strategic questions across both misuse and accidents, but I agree with Victoria, probably if we don’t do our job, we are just going to see more and more of both of these categories causing harm in the world, and more reason to work on both of them. I think both fields need to grow.

Victoria: I just wanted to add, a common cause of both accident risks and misuse risks that might happen in the future is just that these technologies are advancing quickly and there are often unforeseen and surprising ways in which they can fail, either by accident or by having vulnerabilities that can be misused by bad actors. And so as the technology continues to advance quickly we really need to be on the lookout for new ways that it can fail, new accidents but also new ways in which it can be used for harm by bad actors.

Ariel: So one of the things that I got out of this report, and that I think is also coming through now is, it’s kind of depressing. And I found myself often wondering … So at FLI, especially now we’ve got the new grants that are focused more on AGI, we’re worried about some of these bigger, longer-term issues, but with these shorter-term things, I sometimes find myself wondering if we’re even going to make it to AGI, or if something is going to happen that prevents that development in some way. So I was hoping you could speak to that a little bit.

Shahar: Maybe I’ll start with the Malicious Use report, and apologize for its somewhat gloomy perspective. So it should probably be mentioned that, I think almost all of the authors of the report are somewhere between fairly and very optimistic about artificial intelligence. So it’s much more the fact that we see this technology going, we want to see it developed quickly, at least in various narrow domains that are of very high importance, like medicine, like self driving cars — I’m personally quite a big fan. We think that the best way to, if we can foresee and design around or against the misuse risks, then we will eventually end up with a technology that it is more mature, that is more acceptable, that is more trusted because it is trustworthy, because it is secure. We think it is going to be much better to plan for these things in advance.

It is also, again, say we use electricity as an analogy, if I just sat down at the beginning of the age of electricity and I wrote a report about how many people were going to be electrocuted, it would look like a very sad thing. And it’s true, there has been a rapid increase in the number of people who die from electrocution compared to before the invention of electricity and much safety has been built since then to make sure that that risk is minimized, but of course, the benefits have far, far, far outweighed the risks when it comes to electricity and we expect, probably, hopefully, if we take the right actions, like we lay out in the report, then the same is going to be true for misuse risk for AI. At least half of the report, all of Appendix B and a good chunk of the parts before it, talk about what we can do to mitigate those risks, so hopefully the message is not entirely doom and gloom.

Victoria: I think that the things we need to do remain the same no matter how far away we expect these different developments to happen. We need to be looking out for ways that things can fail. We need to be thinking in advance about ways that things can fail, and not wait until problems show up and we actually see that they’re happening. Of course, we often will see problems show up, but in these matters an ounce of prevention can be worth a pound of cure, and there are some mistakes that might just be too costly. For example, if you have some advanced AI that is running the electrical grid or the financial system, we really don’t want that thing to, hack its reward function.

So there are various predictions about how soon different transformative developments of AI might happen and it is possible that things might go awry with AI before we get to general intelligence and what we need to do is basically work hard to try to prevent these kinds of accidents or misuse from happening and try to make sure that AI is ultimately beneficial, because the whole point of building it is because it would be able to solve big problems that we cannot solve by ourselves. So let’s make sure that we get there and that we sort of handle this with responsibility and foresight the whole way.

Ariel: I want to go back to the very first comments that you made about where we were three years ago. How have things changed in the last three years and where do you see the AI safety community today?

Victoria: In the last three years, we’ve seen the AI safety research community get a fair bit bigger and topics of AI safety have become more mainstream, so I will say that long-term AI safety is definitely less controversial and there are more people engaging with the questions and actually working on them. While near-term safety, like questions of fairness and privacy and technological unemployment and so on, I would say that’s definitely mainstream at this point and a lot of people are thinking about that and working on that.

In terms of long term AI safety or AGI safety we’ve seen teams spring up, for example, both DeepMind and OpenAI have a safety team that’s focusing on these sort of technical problems, which includes myself on the DeepMind side. There have been some really interesting bits of progress in technical AI safety. For example, there has been some progress in reward learning and generally value learning. For example, the cooperative inverse reinforcement learning work from Berkeley. There has been some great work from MIRI on logical induction and quantilizing agents and that sort of thing. There have been some papers at mainstream machine learning conferences that focus on technical AI safety, for example, there was an interruptibility paper at NIPS last year and generally I’ve been seeing more presence of these topics in the big conferences, which is really encouraging.

On a more meta level, it has been really exciting to see the Concrete Problems in AI Safety research agenda come out two years ago. I think that’s really been helpful to the field. So these are only some of the exciting advances that have happened.

Ariel: Great. And so, Victoria, I do want to turn now to some of the stuff about FLI’s newest grants. We have an RFP that included quite a few examples and I was hoping you could explain at least two or three of them, but before we get to that if you could quickly define what artificial general intelligence (AGI) is, what we mean when we refer to long-term AI? I think those are the two big ones that have come up so far.

Victoria: So, artificial general intelligence is this idea of an AI system that can learn to solve many different tasks. Some people define this in terms of human-level intelligence as an AI system that will be able to learn to do all human jobs, for example. And this contrasts to the kind of AI systems that we have today which we could call “narrow AI,” in the sense that they specialize in some task or class of tasks that they can do.

So, for example Alpha Zero is a system that is really good at various games like Go and Chess and so on, but it would not be able to, for example, clean up a room, because that’s not in its class of tasks. While if you look at human intelligence we would say that humans are our go-to example of general intelligence because we can learn to do new things, we can adapt to new tasks and new environments that we haven’t seen before and we can transfer our knowledge that we have acquired through previous experience, that might not be in exactly the same settings, to whatever we are trying to do at the moment.

So, AGI is the idea of building an AI system that is also able to do that — not necessarily in the same way as humans, like it doesn’t necessarily have to be human-like to be able to perform the same tasks, or it doesn’t have to be structured the way a human mind is structured. So the definition of AGI is about what it’s capable of rather than how it can do those things. I guess the emphasis there is on the word general.

In terms of the FLI grant program this year, it is specifically focused on the AGI safety issue, which we also call long-term AI safety. Long term here doesn’t necessarily mean that it’s 100 years away. We don’t know how far away AGI actually is; the opinions of experts vary quite widely on that. But it’s more emphasizing that it’s not an immediate problem in the sense that we don’t have AGI yet, but we are trying to foresee what kind of problems might happen with AGI and make sure that if and when AGI is built that it is as safe and aligned with human preferences as possible.

And in particular as a result of the mainstreaming of AI safety that has happened in the past two years, partly, as I like to think, due to FLI’s efforts, at this point it makes sense to focus on long-term safety more specifically since this is still the most neglected area in the AI safety field. I’ve been very happy to see lots and lots of work happening these days on adversarial examples, fairness, privacy, unemployment, security and so on.  I think this allows us to really zoom in and focus on AGI safety specifically to make sure that there’s enough good technical work going on in this field and that the big technical problems get as much progress as possible and that the research community continues to grow and do well.

In terms of the kind of problems that I would want to see solved, I think some of the most difficult problems in AI safety that sort of feed into a lot of the problem areas that we have are things like Goodhart’s Law. Goodhart’s Law is basically that, when a metric becomes a target, it ceases to be a good metric. And the way this applies to AI is that if we make some kind of specification of what objective we want the AI system to optimize for — for example this could be a reward function, or a utility function, or something like that — then, this specification becomes sort of a proxy or a metric for our real preferences, which are really hard to pin down in full detail. Then if the AI system explicitly tries to optimize for the metric or for that proxy, for whatever we specify, for the reward function that we gave, then it will often find some ways to follow the letter but not the spirit of that specification.

Ariel: Can you give a real life example of Goodhart’s Law today that people can use as an analogy?

Victoria: Certainly. So Goodhart’s Law was not originally coined in AI. This is something that generally exists in economics and in human organizations. For example, if employees at a company have their own incentives in some way, like they are incentivized to clock in as many hours as possible, then they might find a way to do that without actually doing a lot of work. If you’re not measuring that then the number of hours spent at work might be correlated with how much output you produce, but if you just start rewarding people for the number of hours then maybe they’ll just play video games all day, but they’ll be in the office. That could be a human example.

There are also a lot of AI examples these days of reward functions that turn out not to give good incentives to AI systems.

Ariel: For a human example, would the issues that we’re seeing with standardized testing be an example of this?

Victoria: Oh, certainly, yes. I think standardized testing is a great example where when students are optimizing for doing well on the tests, then the test is a metric and maybe the real thing you want is learning, but if they are just optimizing for doing well on the test, then actually learning can suffer because they find some way to just memorize or study for particular problems that will show up on the test, which is not necessarily a good way to learn.

And if we get back to AI examples, there was a nice example from OpenAI last year where they had this reinforcement learning agent that was playing a boat racing game and the objective of the boat racing game was to go along the racetrack as fast as possible and finish the race before the other boats do, and to encourage the player to go along the track there were some reward points — little blocks that you have to hit to get rewards — that were along the track, and then the agent just found a degenerate solution where it would just go in a circle and hit the same blocks over and over again and get lots of reward, but it was not actually playing the game or winning the race or anything like that. This is an example of Goodhart’s Law in action. There are plenty of examples of this sort with present day reinforcement learning systems. Often when people are designing a reward function for a reinforcement learning system they end up adjusting it a number of times to eliminate these sort of degenerate solutions that happen.

And this is not limited to reinforcement learning agents. For example, recently there was a great paper that came out about many examples of Goodhart’s Law in evolutionary algorithms. For example, if some evolved agents were incentivized to move quickly in some direction, then they might just evolve to be really tall and then they fall in this direction instead of actually learning to move. There are lots and lots of examples of this and I think that as AI systems become more advanced and more powerful, then I think they’ll just get more clever at finding these sort of loopholes in our specifications of what we want them to do. Goodhart’s Law is, I would say, part of what’s behind various other AI safety issues. For example, negative side effects are often caused by the agent’s specification being incomplete, so there’s something that we didn’t specify.

For example, if we want a robot to carry a box from point A to point B, then if we just reward it for getting the box to point B as fast as possible, then if there’s something in the path of the robot — for example, there’s a vase there — then it will not have an incentive to go around the vase, it would just go right through the vase and break it just to get to point B as fast as possible, and this is an issue because our specification did not include a term for the state of the vase. So, when data is just optimizing for this reward that’s all about the box, then it doesn’t have an incentive to avoid disruptions to the environment.

Ariel: So I want to interrupt with a quick question. These examples so far, we’re obviously worried about them with a technology as powerful as AGI, but they’re also things that apply today. As you mentioned, Goodhart’s Law doesn’t even just apply to AI. What progress has been made so far? Are we seeing progress already in addressing some of these issues?

Victoria: We haven’t seen so much progress in addressing these questions in a very general sort of way, because when you’re building a narrow AI system, then you can often get away with a sort of trial and error approach where you run it and maybe it does something stupid, finds some degenerate solution, then you tweak your reward function, you run it again and maybe it finds a different degenerate solution and then so on and so forth until you arrive at some reward function that doesn’t lead to obvious failure cases like that. For many narrow systems and narrow applications where you can sort of foresee all the ways in which things can go wrong, and just penalize all those ways or build a reward function that avoids all of those failure modes, then there isn’t so much need to find a general solution to these problems. While as we get closer to general intelligence, there will be more need for more principled and more general approaches to these problems.

For example, how do we build an agent that has some idea of what side effects are, or what it means to disrupt an environment that it’s in, no matter what environment you put it in. That’s something we don’t have yet. One of the promising approaches that has been gaining traction recently is reward learning. For example, there was this paper in collaboration between DeepMind and OpenAI called Deep Reinforcement Learning from Human Preferences, where instead of directly specifying a reward function for the agent, it learns a reward function from human feedback. Where, for example, if your agent is this simulated little noodle or hopper that’s trying to do a backflip, then the human would just look at two videos off the agent trying to do a backflip and say, “Well this one looks more like a back flip.” And so, you have a bunch of data from the human about what is more similar to what the human wants the agent to do.

With this kind of human feedback, unlike, for example, demonstrations, the agent can learn something that the human might not be able to demonstrate very easily. For example, even if I cannot do a backflip myself, I can still judge whether someone else has successfully done a backflip or whether this reinforcement agent has done a backflip. This is promising for getting agents to potentially solve problems that humans cannot solve or do things that humans cannot demonstrate. Of course, with human feedback and human-in-the-loop kind of work, there is always the question of scalability because human time is expensive and we want the agent to learn as efficiently as possible from limited human feedback and we also want to make sure that the agent actually gets human feedback in all the relevant situations so it learns to generalize correctly to new situations. There are a lot of remaining open problems in this area as well, but the progress so far has been quite encouraging.

Ariel: Are there others that you want to talk about?

Victoria: Maybe I’ll talk about one other question, which is that of interpretability. Interpretability of AI systems is something that is a big area right now in near-term AI safety that increasingly more people on the research community are thinking about and working on, that is also quite relevant in long-term AI safety. This generally has to do with being able to understand why your system does things a certain way, or makes certain decisions or predictions, or in the case of an agent, why it takes certain actions and also understanding what different components of the system are looking for in the data or how the system is influenced by different inputs and so on. Basically making it less of a black box, and I think there is a reputation for deep learning systems in particular that they are seen as black boxes and it is true that they are quite complex, but I think they don’t necessarily have to be black boxes and there has certainly been progress in trying to explain why they do things.

Ariel: Do you have real world examples?

Victoria: So, for example, if you have some AI system that’s used for medical diagnosis, then on the one hand you could have something simple like a decision tree that just looks at your x-ray and if there is something in a certain position then it gives you a certain diagnosis, and otherwise it doesn’t and so on. Or you could have a more complex system like a neural network that takes into account a lot more factors and then at the end it says, like maybe this person has cancer or maybe this person has something else. But it might not be immediately clear why that diagnosis was made. Particularly in sensitive applications like that, what sometimes happens is that people end up using simpler systems that they find more understandable where they can say why a certain diagnosis was made, even if those systems are less accurate, and that’s one of the important cases for interpretability where if we figure out how to make these more powerful systems more interpretable, for example, through visualization techniques, then they would actually become more useful in these really important applications where it actually matters not just to predict well, but to explain where the prediction came from.

And another area, another example is an algorithm that’s deciding whether to give someone a loan or a mortgage, then if someone’s loan application got rejected then they would really want to know why it got rejected. So the algorithm has to be able to point at some variables or some other aspect of the data that influences decisions or you might need to be able to explain how the data will need to change for the decision to change, what variables would need to be changed by a certain amount for the decision to be different. So these are just some examples of how this can be important and how this is already important. And this kind of interpretability of present day systems is of course already on a lot of people’s minds. I think it is also important to think about interpretability in the longer term as we build more general AI systems that will continue to be important or maybe even become more important to be able to look inside them and be able to check if they have particular concepts that they’re representing.

Like, for example, especially from a safety perspective, whether your system was thinking about the off switch and if it’s thinking about whether it’s going to be turned off, that might be something good to monitor for. We also would want to be able to explain how our systems fail and why they fail. This is, of course, quite relevant today if, let’s say your medical diagnosis AI makes a mistake and we want to know what led to that, why it made the wrong diagnosis. Also on the longer term we want to know why an AI system hacks its reward function, what is it thinking — well “thinking” with quotes, of course — while it’s following a degenerate solution instead of the kind of solution we would want it to find. So, what is the boat race agent that I mentioned earlier paying attention to while it’s going in circles and collecting the same rewards over and over again instead of playing the game, that kind of thing. I think the particular application of interpretability techniques to safety problems is going to be important and it’s one of the examples of the kind of work that we’re looking for in the in the RFP.

Ariel: Awesome. Okay, and so, we’ve been talking about how all these things can go wrong and we’re trying to do all this research to make sure things don’t go wrong, and yet basically we think it’s worthwhile to continue designing artificial intelligence, that no one’s looking at this and saying “Oh my god, artificial intelligence is awful, we need to stop studying it or developing it.” So what are the benefits that basically make these risks worth the risk?

Shahar: So I think one thing is in the domain of narrow applications, it’s very easy to make analogies to software, right? For the things that we have been able to hand over to computers, they really have been the most boring and tedious and repetitive things that humans can do and we now no longer need to do them and productivity has gone up and people are generally happier and they can get paid more for doing more interesting things and we can just build bigger systems because we can hand off the control of them to machines that don’t need to sleep and don’t make small mistakes in calculations. Now the promise of turning that and adding to that all of the narrow things that experts can do, whether it’s improving medical diagnosis, whether it’s maybe farther down the line some elements of drug discovery, whether it’s piloting a car or operating machinery, many of these areas where human labor is currently required because there is a fuzziness to the task, it does not enable a software engineer to come in and code an algorithm, but maybe with machine learning in the not too distant future we’ll be able to turn them over to machines.

It means taking some skills that only a few individuals in the world can do and making those available to everyone around the world in some domains. That seems, I mean, concrete examples are, the ones that I have I try to find the companies that do them and get involved with them because I want to see them happen sooner and the ones that I can’t imagine yet, someone will come along and make a company out of it, or a not-for-profit for it. But we’ve seen applications from agriculture, to medicine, to computer security, to entertainment and art, and driving and transport, and in all of these I think we’re just gonna be seeing even more. I think we’re gonna have more creative products out there that were designed in collaboration between humans and machines. We’re gonna see more creative solutions to scientific engineering problems. We’re gonna see those professions where really good advice is very valuable, but there are only so many people who can help you — so if I’m thinking of doctors and lawyers, taking some of that advice and making it universally accessible through an app just makes life smoother. These are some of the examples that come to my mind.

Ariel: Okay, great. Victoria what are the benefits that you think make these risks worth addressing?

Victoria: I think there are many ways in which AI systems can make our lives a lot better and make the world a lot better especially as we build more general systems that are more adaptable. For example, these systems could help us with designing better institutions and better infrastructure, better health systems or electrical systems or what have you. Even now, there are examples like the Google project on optimizing the data center energy use using machine learning, which is something that Deep Mind was working on, where the use of machine learning algorithms to automate energy used in the data centers improved their energy efficiency by I think something like 40 percent. That’s of course with fairly narrow AI systems.

I think as we build more general AI systems we can expect, we can hope for really creative and innovative solutions to the big problems that humans face. So you can think of something like AlphaGo’s famous “move 37” that overturned thousands of years of human wisdom in Go. What if you can build even more general and even more creative systems and apply them to real world problems? I think there is great promise in that. I think this can really transform the world in a positive direction, and we just have to make sure that as the systems are built that we think about safety from the get go and think about it in advance and trying to build them to be as resistant to accidents and misuse as possible so that all these benefits can actually be achieved.

The things I mentioned were only examples of the possible benefits. Imagine if you could have an AI scientist that’s trying to develop better drugs against diseases that have really resisted treatment or more generally just doing science faster and better if you actually have more general AI systems that can think as flexibly as humans can about these sort of difficult problems. And they would not have some of the limitations that humans have where, for example, our attention is limited our memory is limited, while AI could be, at least theoretically, unlimited in it’s processing power, in the resources available to it, it can be more parallelized, it can be more coordinated and I think all of the big problems that are so far unsolved are these sort of coordination problems that require putting together a lot of different pieces of information and a lot of data. And I think there are massive benefits to be reaped there if we can only get to that point safely.

Ariel: Okay, great. Well thank you both so much for being here. I really enjoyed talking with you.

Shahar: Thank you for having us. It’s been really fun.

Victoria: Yeah, thank you so much.

[end of recorded material]

Podcast: AI and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

What does it mean to create beneficial artificial intelligence? How can we expect to align AIs with human values if humans can’t even agree on what we value? Building safe and beneficial AI involves tricky technical research problems, but it also requires input from philosophers, ethicists, and psychologists on these fundamental questions. How can we ensure the most effective collaboration?

Ariel spoke with FLI’s Meia Chita-Tegmark and Lucas Perry on this month’s podcast about the value alignment problem: the challenge of aligning the goals and actions of AI systems with the goals and intentions of humans. 

Topics discussed in this episode include:

  • how AGI can inform human values,
  • the role of psychology in value alignment,
  • how the value alignment problem includes ethics, technical safety research, and international coordination,
  • a recent value alignment workshop in Long Beach,
  • and the possibility of creating suffering risks (s-risks).

This podcast was edited by Tucker Davey. You can listen to it above or read the transcript below.

 

Ariel: I’m Ariel Conn with the Future of Life Institute, and I’m excited to have FLI’s Lucas Perry and Meia Chita-Tegmark with me today to talk about AI, ethics and, more specifically, the value alignment problem. But first, if you’ve been enjoying our podcast, please take a moment to subscribe and like this podcast. You can find us on iTunes, SoundCloud, Google Play, and all of the other major podcast platforms.

And now, AI, ethics, and the value alignment problem. First, consider the statement “I believe that harming animals is bad.” Now, that statement can mean something very different to a vegetarian than it does to an omnivore. Both people can honestly say that they don’t want to harm animals, but how they define “harm” is likely very different, and these types of differences in values are common between countries and cultures, and even just between individuals within the same town. And then we want to throw AI into the mix. How can we train AIs to respond ethically to situations when the people involved still can’t come to an agreement about what an ethical response should be?

The problem is even more complicated because often we don’t even know what we really want for ourselves, let alone how to ask an AI to help us get what we want. And as we’ve learned with stories like that of King Midas, we need to be really careful what we ask for. That is, when King Midas asked the genie to turn everything to gold, he didn’t really want everything — like his daughter and his food — turned to gold. And we would prefer than an AI we design recognize that there’s often implied meaning in what we say, even if we don’t say something explicitly. For example, if we jump into an autonomous car and ask it to drive us to the airport as fast as possible, implicit in that request is the assumption that, while we might be OK with some moderate speeding, we intend for the car to still follow most rules of the road, and not drive so fast as to put anyone’s life in danger or take illegal routes. That is, when we say “as fast as possible,” we mean “as fast as possible within the rules of law,” and not within the rules of physics or within the laws of physics. And these examples are just the tiniest tip of the iceberg, given that I didn’t even mention artificial general intelligence (AGI) and how that can be developed such that its goals align with our values.

So as I mentioned a few minutes ago, I’m really excited to have Lucas and Meia joining me today. Meia is a co-founder of the Future of Life Institute. She’s interested in how social sciences can contribute to keeping AI beneficial, and her background is in social psychology. Lucas works on AI and nuclear weapons risk-related projects at FLI. His background is in philosophy with a focus on ethics. Meia and Lucas, thanks for joining us today.

Meia: It’s a pleasure. Thank you.

Lucas: Thanks for having us.

Ariel: So before we get into anything else, one of the big topics that comes up a lot when we talk about AI and ethics is this concept value alignment. I was hoping you could both maybe talk just a minute about what value alignment is and why it’s important to this question of AI and ethics.

Lucas: So value alignment, in my view, is bringing AI’s goals, actions, intentions and decision-making processes in accordance with what humans deem to be the good or what we see as valuable or what our ethics actually are.

Meia: So for me, from the point of view of psychology, of course, I have to put the humans at the center of my inquiry. So from that point of view, value alignment … You can think about it also in terms of humans’ relationships with other humans. But I think it’s even more interesting when you add artificial agents into the mix. Because now you have an entity that is so wildly different from humans yet we would like it to embrace our goals and our values in order to keep it beneficial for us. So I think the question of value alignment is very central to keeping AI beneficial.

Lucas: Yeah. So just to expand on what I said earlier: The project of value alignment is in the end creating beneficial AI. It’s working on what it means for something to be beneficial, what beneficial AI exactly entails, and then learning how to technically instantiate that into machines and AI systems. Also, building the proper like social and political context for that sort of technical work to be done and for it to be fulfilled and manifested in our machines and AIs.

Ariel: So when you’re thinking of AI and ethics, is value alignment basically synonymous, just another way of saying AI and ethics or is it a subset within this big topic of AI and ethics?

Lucas: I think they have different connotations. If one’s thinking about AI ethics, I think that one is tending to be moreso focused on applied ethics and normative ethics. One might be thinking about the application of AI systems and algorithms and machine learning in domains in the present day and in the near future. So one might think about atomization and other sorts of things. I think that when one is thinking about value alignment, it’s much more broad and expands also into metaethics and really sort of couches and frames the problem of AI ethics as something which happens over decades and which has a tremendous impact. I think that value alignment has a much broader connotation than what AI ethics has traditionally had.

Meia: I think it all depends on how you define value alignment. I think if you take the very broad definition that Lucas has just proposed, I think that yes, it probably includes AI ethics. But you can also think of it more narrowly as simply instantiating your own values into AI systems and having them adopt your goals. In that case, I think there are other issues as well because if you think about it from the point of view of psychology, for example, then it’s not just about which values get instantiated and how you do that, how you solve the technical problem, but also we know that humans, even if they know what goals they have and what values they uphold, it’s very, very hard for them sometimes to actually act in accordance to them because they have all sorts of cognitive and emotional effective limitations. So in that case I think value alignment is, in this narrow sense, is basically not sufficient. We also need to think about AIs and applications of AIs in terms of how do they help us and how do they make sure that we gain the cognitive competencies that we need to be moral beings and to be really what we should be, not just what we are.

Lucas: Right. I guess to expand on what I was just saying. Value alignment I think in the more traditional sense, it’s sort of all … It’s more expansive and inclusive in that it’s recognizing a different sort of problem than AI ethics alone has. I think that when one is thinking about value alignment, there are elements of thinking about — somewhat about machine ethics but also about social, political, technical and ethical issues surrounding the end goal of eventually creating AGI. Whereas, AI ethics can be more narrowly interpreted just as certain sorts of specific cases where AI’s having impact and implications in our lives in the next 10 years. Whereas, value alignment’s really thinking about the instantiation of ethics and machines and making machine systems that are corrigible and robust and docile, which will create a world that we’re all happy about living in.

Ariel: Okay. So I think that actually is going to flow really nicely into my next question, and that is, at FLI we tend to focus on existential risks. I was hoping you could talk a little bit about how issues of value alignment are connected to the existential risks that we concern ourselves with.

Lucas: Right. So, we can think of AI systems as being very powerful optimizers. We can imagine there being a list of all possible futures and what intelligence is good for is for modeling the world and then committing to and doing actions which constrain the set of all possible worlds to ones which are desirable. So intelligence is sort of the means by which we get to an end, and ethics is the end towards which we strive. So these are how these two things really integral and work together and how AI without ethics makes no sense and how ethics without AI or intelligence in general also just doesn’t work. So in terms of existential risk, there are possible futures that intelligence can lead us to where earth-originating intelligent life no longer exists either intentionally or by accident. So value alignment sort of fits in by constraining the set of all possible futures by working on technical work by doing political and social work and also work in ethics to constrain the actions of AI systems such that existential risks do not occur, such that by some sort of technical oversight, by some misalignment of values, by some misunderstanding of what we want, the AI generates an existential risk.

Meia: So we should remember that homo sapiens represent an existential risk to itself also. We are creating nuclear weapons. We have more of them than we need. So many, in fact, that we could destroy the entire planet with them. Not to mention homo sapiens has also represented an existential risk for all other species. The problem is AI is that we’re introducing in the mix a whole new agent that is by definition supposed to be more intelligent, more powerful than us and also autonomous. So as Lucas mentioned, it’s very important to think through what kind of things and abilities do we delegate to these AIs and how can we make sure that they have the survival and the flourishing of our species in mind. So I think this is where value alignment comes in as a safeguard against these very terrible and global risks that we can imagine coming from AI.

Lucas: Right. What makes doing that so difficult is beyond the technical issue of just having AI researchers and AI safety researchers knowing how to just get AI systems to actually do what we want without creating a universe of paperclips. There’s also this terrible social and political context in which this is all happening where there is really great game-theoretic incentives to be the first person to create artificial general intelligence. So in a race to create AI, a lot of these efforts that seem very obvious and necessary could be cut in favor of more raw power. I think that’s probably one of the biggest risks for us not succeeding in creating value-aligned AI.

Ariel: Okay. Right now it’s predominantly technical AI people who are considering mostly technical AI problems. How to solve different problems is usually, you need a technical approach for this. But when it comes to things like value alignment and ethics, most of the time I’m hearing people suggest that we can’t leave that up to just the technical AI researchers. So I was hoping you could talk a little bit about who should be part of this discussion, why we need more people involved, how we can get more people involved, stuff like that.

Lucas: Sure. So maybe if I just break the problem down into just what I view to be the three different parts then talking about it will make a little bit more sense. So we can break down the value alignment problem into three separate parts. The first one is going to be the technical issues, the issues surrounding actually creating artificial intelligence. The issues of ethics, so the end towards which we strive. The set of possible futures which we would be happy in living, and then also there’s the governance and the coordination and the international problem. So we can sort of view this as a problem of intelligence, a problem of agreeing on the end towards which intelligence is driven towards, and also the political and social context in which all of this happens.

So thus far, there’s certainly been a focus on the technical issue. So there’s been a big rise in the field of AI safety and in attempts to generate beneficial AI, attempts at creating safe AGI and mechanisms for avoiding reward hacking and other sorts of things that happen when systems are trying to optimize their utility function. The Concrete Problems on AI Safety paper has been really important and sort of illustrates some of these technical issues. But even between technical AI safety research and ethics there’s disagreement about something also like machine ethics. So how important is machine ethics? Where does machine ethics fit in to technical AI safety research? How much time and energy should we put into certain kinds of technical AI research versus how much time and effort should we put into issues in governance and coordination and addressing the AI arms race issues? How much of ethics do we really need to solve?

So I think there’s a really important and open question regarding how do we apply and invest our limited resources in sort of addressing these three important cornerstones in value alignment so that the technical issue, the issues in ethics and then issues in governance and coordination, and how do we optimize working on these issues given the timeline that we have? How much resources should we put in each one? I think that’s an open question. Yeah, one that certainly needs to be addressed more about how we’re going to move forward given limited resources.

Meia: I do think though the focus so far has been so much on the technical aspect. As you were saying, Lucas, there are other aspects to this problem that need to be tackled. What I’d like to emphasize is that we cannot solve the problem if we don’t pay attention to the other aspects as well. So I’m going to try to defend, for example, psychology here, which has been largely ignored I think in the conversation.

So from the point of view of psychology, I think the value alignment problem is double fold in a way. It’s about a triad of interactions. Human, AI, other humans, right? So we are extremely social animals. We interact a lot with other humans. We need to align our goals and values with theirs. Psychology has focused a lot on that. We have a very sophisticated set of psychological mechanisms that allow us to engage in very rich social interactions. But even so, we don’t always get it right. Societies have created a lot of suffering, a lot of moral harm, injustice, unfairness throughout the ages. So for example, we are very ill-prepared by our own instincts and emotions to deal with inter-group relations. So that’s very hard.

Now, people coming from the technical side, they can say, “We’re just going to have AI learn our preferences.” Inverse reinforcement learning is a proposal that says that basically explains how to keep humans in the loop. So it’s a proposal for programing AI such that it gets its reward not from achieving a goal but from getting good feedback from a human because it achieved a goal. So the hope is that this way AI can be correctable and can learn from human preferences.

As a psychologist, I am intrigued, but I understand that this is actually very hard. Are we humans even capable of conveying the right information about our preferences? Do we even have access to them ourselves or is this all happening in some sort of subconscious level? Sometimes knowing what we want is really hard. How do we even choose between our own competing preferences? So this involves a lot more sophisticated abilities like impulse control, executive function, etc. I think that if we don’t pay attention to that as well in addition to solving the technical problem, I think we are very likely to not get it right.

Ariel: So I’m going to want to come back to this question of who should be involved and how we can get more people involved, but one of the reasons that I’m talking to the both of you today is because you actually have made some steps in broadening this discussion already in that you set up a workshop that did bring together a multidisciplinary team to talk about value alignment. I was hoping you could tell us a bit more about how that workshop went, what interesting insights were gained that might have been expressed during the workshop, what you got out of it, why you think it’s important towards the discussion? Etc.

Meia: Just to give a few facts about the workshop. The workshop took place in December 2017 in Long Beach, California. We were very lucky to have two wonderful partners in co-organizing this workshop. The Berggruen Institute and the Canadian Institute for Advanced Research. And the idea for the workshop was very much to have a very interdisciplinary conversation about value alignment and reframe it as not just a technical problem but also one that involves disciplines such as philosophy and psychology, political science and so on. So we were very lucky actually to have a fantastic group of people there representing all these disciplines. The conversation was very lively and we discussed topics all the way from near term considerations in AI and how we align AI to our goals and also all the way to thinking about AGI and even super intelligence. So it was a fascinating range both of topics discussed and also perspectives being represented.

Lucas: So my inspiration for the workshop was being really interested in ethics and the end towards which this is all going. What really is the point of creating AGI and perhaps even eventually superintelligence? What is it that is good and what is that is valuable? Broadening from that and becoming more interested in value alignment, the conversation thus far has been primarily understood as something that is purely technical. So value alignment has only been seen as something that is for technical AI safety researchers to work on because there are technical issues regarding AI safety and how you get AIs to do really simple things without destroying the world or ruining a million other things that we care about. But this is really, as we discussed earlier, an interdependent issue that covers issues in metaethics and normative ethics, applied ethics. It covers issues in psychology. It covers issue in law, policy, governance, coordination. It covers the AI arms race issue. Solving the value alignment problem and creating a future with beneficial AI is a civilizational project where we need everyone working on all these different issues. On issues of value, on issues of game theory among countries, on the technical issues, obviously.

So what I really wanted to do was I wanted to start this workshop in order to broaden the discussion. To reframe value alignment as not just something in technical AI research but something that really needs voices from all disciplines and all expertise in order to have a really robust conversation that reflects the interdependent nature of the issue and where different sorts of expertise on the different parts of the issue can really come together and work on it.

Ariel: Is there anything specific that you can tell us about what came out of the workshop? Were there any comments that you thought were especially insightful or ideas that you think are important for people to be considering?

Lucas: I mean, I think that for me one of the takeaways from the workshop is that there’s still a mountain of work to do and that there are a ton of open questions. This is a very, very difficult issue. I think that one thing I took away from the workshop was that we couldn’t even agree on the minimal conditions for which it would be okay to safely deploy AGI. There are just issues that seem extremely trivial in value alignment from the technical side and from the ethical side that seem very trivial, but on which I think there is very little understanding or agreement right now.

Meia: I think the workshop was a start and one good thing that happened during the workshop is I felt that the different disciplines or rather their representatives were able to sort of air out their frustrations and also express their expectations of the others. So I remember this quite iconic moment when one roboticist simply said, “But I really want you ethics people to just tell me what to implement in my system. What do you want my system to do?” So I think that was actually very illustrative of what Lucas was saying — the need for more joint work. I think there was a lot of expectations I think from both the technical people towards the ethicists but also from the ethicists in terms of like, “What are you doing? Explain to us what are the actual ethical issues that you think you are facing with the things that you are building?” So I think there’s a lot of catching up to do on both sides and there’s much work to be done in terms of making these connections and bridging the gaps.

Ariel: So you referred to this as sort of a first step or an initial step. What would you like to see happen next?

Lucas: I don’t have any concrete or specific ideas for what exactly should happen next. I think that’s a really difficult question. Certainly, things that most people would want or expect. I think in the general literature and conversations that we were having, I think that value alignment, as a word and as something that we understand, needs to be expanded outside of the technical context. I don’t think that it’s expanded that far. I think that more ethicists and more moral psychologists and people in law policy and governance need to come in and need to work on this issue. I’d like to see more coordinated collaborations, specifically involving interdisciplinary crowds informing each other and addressing issues and identifying issues and really some sorts of formal mechanisms for interdisciplinary coordination on value alignment.

It would be really great if people in technical research, in technical AI safety research and in ethics and governance could also identify all of the issues in their own fields, which the resolution to those issues and the solution to those issues requires answers from other fields. So for example, inverse reinforcement learning is something that Meia was talking about earlier and I think it’s something that we can clearly decide and see as being interdependent on a ton of issues in a law and also in ethics and in value theory. So that would be sort of like an issue or node in the landscape of all issues and technical safety research that would be something that is interdisciplinary.

So I think it would be super awesome if everyone from their own respective fields are able to really identify the core issues which are interdisciplinary and able to dissect them into the constituent components and sort of divide them among the disciplines and work together on them and identify the different timelines at which different issues need to be worked on. Also, just coordinate on all those things.

Ariel: Okay. Then, Lucas, you talked a little bit about nodes and a landscape, but I don’t think we’ve explicitly pointed out that you did create a landscape of value alignment research so far. Can you talk a little bit about what that is and how people can use it?

Lucas: Yeah. For sure. With the help of other colleagues at the Future of Life Institute like Jessica Cussins and Richard Mallah, we’ve gone ahead and created a value alignment conceptual landscape. So what this is is it’s a really big tree, almost like an evolutionary tree that you would see, but what it is, is a conceptual mapping and landscape of the value alignment problem. What it’s broken down into are the three constituent components, which we were talking about earlier, which is the technical issues, the issues in technically creating safe AI systems. Issues in ethics, breaking that down into issues in metaethics and normative ethics and applied ethics and moral psychology and descriptive ethics where we’re trying to really understand values, what it means for something to be valuable and what is the end towards which intelligence will be aimed at. Then also, the other last section is governance. So issues in coordination and policy and law in creating a world where AI safety research can proceed and where there aren’t … Where we don’t develop or allow a sort of winner-take-all scenario to rush us towards the end and not really have a final and safe solution towards fully autonomous powerful systems.

So what the landscape here does is it sort of outlines all of the different conceptual nodes in each of these areas. It lays out what all the core concepts are, how they’re all related. It defines the concepts and also gives descriptions about how the concepts fit into each of these different sections of ethics, governance, and technical AI safety research. So the hope here is that people from different disciplines can come and see the truly interdisciplinary nature of the value alignment problem, to see where ethics and governance and the technical AI safety research stuff all fits in together and how this all together really forms, I think, the essential corners of the value alignment problem. It’s also nice for researchers and other persons to understand the concepts and the landscape of the other parts of this problem.

I think that, for example, technical AI safety researchers probably don’t know much about metaethics or they don’t spend too much time thinking about normative ethics. I’m sure that ethicists don’t spend very much time thinking about technical value alignment and how inverse reinforcement learning is actually done and what it means to do robust human imitation in machines. What are the actual technical, ethical mechanisms that are going to go into AI systems. So I think that this is like a step in sort of laying out the conceptual landscape, in introducing people to each other’s concepts. It’s a nice visual way of interacting with I think a lot of information and sort of exploring all these different really interesting nodes that explore a lot of very deep, profound moral issues, very difficult and interesting technical issues, and issues in law, policy and governance that are really important and profound and quite interesting.

Ariel: So you’ve referred to this as the value alignment problem a couple times. I’m curious, do you see this … I’d like both of you to answer this. Do you see this as a problem that can be solved or is this something that we just always keep working towards and it’s going to influence — whatever the current general consensus is will influence how we’re designing AI and possibly AGI, but it’s not ever like, “Okay. Now we’ve solved the value alignment problem.” Does that make sense?

Lucas: I mean, I think that that sort of question really depends on your metaethics, right? So if you think there are moral facts, if you think that more statements can be true or false and aren’t just sort of subjectively dependent upon whatever our current values and preferences historically and evolutionarily and accidentally happen to be, then there is an end towards which intelligence can be aimed that would be objectively good and which would be the end toward which we would strive. In that case, if we had solved the technical issue and the governance issue and we knew that there was a concrete end towards which we would strive that was the actual good, then the value alignment problem would be solved. But if you don’t think that there is a concrete end, a concrete good, something that is objectively valuable across all agents, then the value alignment problem or value alignment in general is an ongoing process and evolution.

In terms of the technical and governance sides of those, I think that there’s nothing in the laws of physics or I think in computer science or in game theory that says that we can’t solve those parts of the problem. Those ones seem intrinsically like they can be solved. That’s nothing to say about how easy or how hard it is to solve those. But whether or not there is sort of an end towards value alignment I think depends on difficult questions in metaethics and whether something like moral error theory is true where all moral statements are simply false and that morality is maybe sort of just like a human invention, which has no real answers or who’s answers are all false. I think that’s sort of the crux of whether or not value alignment can “be solved” because I think the technical issues and the issues in governance are things which are in principle able to be solved.

Ariel: And Meia?

Meia: I think that regardless of whether there is an absolute end to this problem or not, there’s a lot of work that we need to do in between. I also think that in order to even achieve this end, we need more intelligence, but as we create more intelligent agents, again, this problem gets magnified. So there’s always going to be a race between the intelligence that we’re creating and making sure that it is beneficial. I think at every step of the way, the more we increase the intelligence, the more we need to think about the broader implications. I think in the end we should think of artificial intelligence also not just as a way to amplify our own intelligence but also as a way to amplify our moral competence as well. As a way to gain more answers regarding ethics and what our ultimate goals should be.

So I think that the interesting questions that we can do something about are somewhere sort of in between. We will not have the answer before we are creating AI. So we always have to figure out a way to keep up with the development of intelligence in terms of our development of moral competence.

Ariel: Meia, I want to stick with you for just a minute. When we talked for the FLI end of your podcast, one of the things you said you were looking forward to in 2018 is broadening this conversation. I was hoping you could talk a little bit more about some of what you would like to see happen this year in terms of getting other people involved in the conversation, who you would like to see taking more of an interest in this?

Meia: So I think that unfortunately, especially in academia, we’ve sort of defined our work so much around these things that we call disciplines. I think we are now faced with problems, especially in AI, that really are very interdisciplinary. We cannot get the answers from just one discipline. So I would actually like to see in 2018 more sort of, for example, funding agencies proposing and creating funding sources for interdisciplinary projects. The way it works, especially in academia, so you propose grants to very disciplinary-defined granting agencies.

Another thing that would be wonderful to start happening is our education system is also very much defined and described around these disciplines. So I feel that, for example, there’s a lack of courses, for example, that teach students in technical fields things about ethics, moral psychology, social sciences and so on. The converse is also true; in social sciences and in philosophy we hear very little about advancements in artificial intelligence and what’s new and what are the problems that are there. So I’d like to see more of that. I’d like to see more courses like this developed. I think a friend of mine and I, we’ve spent some time thinking about how many courses are there that have an interdisciplinary nature and actually talk about the societal impacts of AI and there’s a handful in the entire world. I think we counted about five or six of them. So there’s a shortage of that as well.

But then also educating the general public. I think thinking about the implications of AI and also the societal implications of AI and also the value alignment problem is something that’s probably easier for the general public to grasp rather than thinking about the technical aspects of how to make it more powerful or how to make it more intelligent. So I think there’s a lot to be done in educating, funding, and also just simply having these conversations. I also very much admire what Lucas has been doing. I hope he will expand on it, creating this conceptual landscape so that we have people from different disciplines understanding their terms, their concepts, each other’s theoretical frameworks with which they work. So I think all of this is valuable and we need to start. It won’t be completely fixed in 2018 I think. But I think it’s a good time to work towards these goals.

Ariel: Okay. Lucas, is there anything that you wanted to add about what you’d like to see happen this year?

Lucas: I mean, yeah. Nothing else I think to add on to what I said earlier. Obviously we just need as many people from as many disciplines working on this issue because it’s so important. But just to go back a little bit, I was also really liking what Meia said about how AI systems and intelligence can help us with our ethics and with our governance. I think that seems like a really good way forward potentially if as our AI systems grow more powerful in their intelligence, they’re able to inform us moreso about our own ethics and our own preferences and our own values, about our own biases and about what sorts of values and moral systems are really conducive to the thriving of human civilization and what sorts of moralities lead to sort of navigating the space of all possible minds in a way that is truly beneficial.

So yeah. I guess I’ll be excited to see more ways in which intelligence and AI systems can be deployed for really tackling the question of what beneficial AI exactly entails. What does beneficial mean? We all want beneficial AI, but what is beneficial, what does that mean? What does that mean for us in a world in which no one can agree on what beneficial exactly entails? So yeah, I’m just excited to see how this is going to work out, how it’s going to evolve and hopefully we’ll have a lot more people joining this work on this issue.

Ariel: So your comment reminded me of a quote that I read recently that I thought was pretty interesting. I’ve been reading Paula Boddington’s book Toward a Code of Ethics for Artificial Intelligence. This was actually funded at least in part if not completely by FLI grants. But she says, “It’s worth pointing out that if we need AI to help us make moral decisions better, this cast doubt on the attempts to ensure humans always retain control over AI.” I’m wondering if you have any comments on that.

Lucas: Yeah. I don’t know. I think this sort of a specific way of viewing the issue or it’s a specific way of viewing what AI systems are for and the sort of future that we want. In the end is the best at all possible futures a world in which human beings ultimately retain full control over AI systems. I mean, if AI systems are autonomous and if value alignment actually succeeds, then I would hope that we created AI systems which are more moral than we are. AI systems which have better ethics, which are less biased, which are more rational, which are more benevolent and compassionate than we are. If value alignment is able to succeed and if we’re able to create autonomous intelligent systems of that sort of caliber of ethics and benevolence and intelligence, then I’m not really sure what the point is of maintaining any sort of meaningful human control.

Meia: I agree with you, Lucas. That if we do manage to create … In this case, I think it would have to be artificial general intelligence that is more moral, more beneficial, more compassionate than we are, then the issue of control, it’s probably not so important. But in the meantime, I think, while we are sort of tinkering with artificial intelligent systems, I think the issue of control is very important.

Lucas: Yeah. For sure.

Meia: Because we wouldn’t want to … We wouldn’t want to cut out of the loop too early before we’ve managed to properly test the system, make sure that indeed it is doing what we intended to do.

Lucas: Right. Right. I think that in the process of that that it requires a lot of our own moral evolution, something which we humans are really bad and slow at. As president of FLI Max Tegmark likes to talk about, he likes to talk about the race between our growing wisdom and the growing power of our technology. Now, human beings are really kind of bad at keeping our wisdom in pace with the growing power of our technology. If we sort of look at the moral evolution of our species, we can sort of see huge eras in which things which were seen as normal and mundane and innocuous, like slavery or the subjugation of women or other sorts of things like that. Today we have issues with factory farming and animal suffering and income inequality and just tons of people who are living with exorbitant wealth that doesn’t really create much utility for them, whereas there’s tons of other people who are in poverty and who are still starving to death. There are all sorts of things that we can see in the past as being obviously morally wrong.

Meia: Under the present too.

Lucas: Yeah. So then we can see that obviously there must be things like that today. We wonder, “Okay. What are the sorts of things today that we see and innocuous and normal and as mundane that the people of tomorrow, as William MacAskill says, will see us as moral monsters? How are we moral monsters today, but we simply can’t see it? So as we create powerful intelligence systems and we’re working on our ethics and we’re trying to really converge on constraining the set of all possible worlds into ones which are good and which are valuable and ethical, it really demands a moral evolution of ourselves that we sort of have to figure out ways to catalyze and work on and move through, I think, faster.

Ariel: Thank you. So as you consider attempts to solve the value alignment problem, what are you most worried about, either in terms of us solving it badly or not quickly enough or something along those lines? What is giving you the most hope in terms of us being able to address this problem?

Lucas: I mean, I think just technically speaking, ignoring the likelihood of this — the worst of all possible outcomes would be something like an s-risk. So an s-risk is a subset of x-risks — s-risk stands for suffering risk. So this is a sort of risk whereby some sort of value misalignment, whether it be intentional or much more likely accidental, some seemingly astronomical amount of suffering is produced by deploying a misaligned AI system. The way that this was function is given certain sorts of assumptions about the philosophy of mind, about consciousness and machines, if we understand potentially consciousness and experience to be substrate-independent, meaning if consciousness can be instantiated in machine systems, that you don’t just need meat to be conscious, but you need something like integrated information or information processing or computation or something like that, then the invention of AI systems and superintelligence and the spreading of intelligence, which optimizes towards any sort of arbitrary end, it could potentially lead to vast amounts of digital suffering, which would potentially arise accidentally or through subroutines or simulations, which would be epistemically useful but that involve a great amount of suffering. That coupled with these artificial intelligent systems running on silicon and iron and not on squishy, wet, human neurons would be that it would be running at digital time scales and not biological time scales. So there would be huge amplification of the speed of which the suffering was run. So subjectively, we might infer that a second for a computer, a simulated person on a computer, would be much greater than that for a biological person. Then we can sort of reflect that these are the sorts of risks — or an s-risk would be something that would be really bad. Just any sort of way that AI can be misaligned and lead to a great amount of suffering. There’s a bunch of different ways that this could happen.

So something like an s-risk would be something super terrible but it’s not really clear how likely that would be. But yeah, I think that beyond that obviously we’re worried about existential risk, we’re worried about ways that this could curtail or destroy the development of earth-originating intelligent life. Ways that this really might happen are I think most likely because of this winner-take-all scenario that you have with AI. We’ve had nuclear weapons for a very long time now, and we’re super lucky that nothing bad has happened. But I think the human civilization is really good at getting stuck into minimum equilibria where we get locked into these positions where it’s not easy to escape from. So it’s really not easy to disarm and get out of the nuclear weapons situation once we’ve discovered it. Once we start to develop, I think, more powerful and robust AI systems, I think already that a race towards AGI and towards more and more powerful AI might be very, very hard to stop if we don’t make significant progress on that soon, if we’re not able to get a ban on lethal autonomous weapons and if we’re not able to introduce any real global coordination and that we all just start racing towards more powerful systems that there might be a race towards AGI, which would cut corners on safety and potentially make the likelihood of an existential risk or suffering risk more likely.

Ariel: Are you hopeful for anything?

Lucas: I mean, yeah. If we get it right, then the next billion years can be super amazing, right? It’s just kind of hard to internalize that and think about that. It’s really hard to say I think how likely it is that we’ll succeed in any direction. But yeah, I’m hopeful that if we succeed in value alignment that the future can be unimaginably good.

Ariel: And Meia?

Meia: What’s scary to me is that it might be too easy to create intelligence. That there’s nothing in the laws of physics making it hard for us. Thus I think that it might happen too fast. Evolution took a long time to figure out how to make us intelligent, but that was probably just because it was trying to optimize for things like energy consumption and making us a certain size. So that’s scary. It’s scary that it’s happening so fast. I’m particularly scared that it might be easy to crack general artificial intelligence. I keep asking Max, “Max, but isn’t there anything in the laws of physics that might make it tricky?” His answer and also that of more physicists that I’ve been discussing with is that, “No, it doesn’t seem to be the case.”

Now, what makes me hopeful is that we are creating this. Stuart Russell likes to give this example of a message from an alien civilization, an alien intelligence that says, “We will be arriving in 50 years.” Then he poses the question, “What would you do when you prepare for that?” But I think with artificial intelligence it’s different. It’s not like it’s arriving and it’s a given and it has a certain form or shape that we cannot do anything about. We are actually creating artificial intelligence. I think that’s what makes me hopeful that if we actually research it right, that if we think hard about what we want and we work hard at getting our own act together, first of all, and also on making sure that this stays and is beneficial, we have a good chance to succeed.

Now, there’ll be a lot of challenges in between from very near-term issues like Lucas was mentioning, for example, autonomous weapons, weaponizing our AI and giving it the right to harm and kill humans, to other issues regarding income inequality enhanced by technological development and so on, to down the road how do we make sure that autonomous AI systems actually adopt our goals. But I do feel that it is important to try and it’s important to work at it. That’s what I’m trying to do and that’s what I hope others will join us in doing.

Ariel: All right. Well, thank you both again for joining us today.

Lucas: Thanks for having us.

Meia: Thanks for having us. This was wonderful.

Ariel: If you’re interested in learning more about the value alignment landscape that Lucas was talking about, please visit FutureofLife.org/valuealignmentmap. We’ll also link to this in the transcript for this podcast. If you enjoyed this podcast, please subscribe, give it a like, and share it on social media. We’ll be back again next month with another conversation among experts.

[end of recorded material]

Podcast: Top AI Breakthroughs and Challenges of 2017 with Richard Mallah and Chelsea Finn

AlphaZero, progress in meta-learning, the role of AI in fake news, the difficulty of developing fair machine learning — 2017 was another year of big breakthroughs and big challenges for AI researchers!

To discuss this more, we invited FLI’s Richard Mallah and Chelsea Finn from UC Berkeley to join Ariel for this month’s podcast. They talked about some of the technical progress they were most excited to see and what they’re looking forward to in the coming year.

You can listen to the podcast here, or read the transcript below.

Ariel: I’m Ariel Conn with the Future of Life Institute. In 2017, we saw an increase in investments into artificial intelligence. More students are applying for AI programs, and more AI labs are cropping up around the world. With 2017 now solidly behind us, we wanted to take a look back at the year and go over some of the biggest AI breakthroughs. To do so, I have Richard Mallah and Chelsea Finn with me today.

Richard is the director of AI projects with us at the Future of Life Institute, where he does meta-research, analysis and advocacy to keep AI safe and beneficial. Richard has almost two decades of AI experience in industry and is currently also head of AI R & D at the recruiting automation firm, Avrio AI. He’s also co-founder and chief data science officer at the content marketing planning firm, MarketMuse.

Chelsea is a PhD candidate in computer science at UC Berkeley and she’s interested in how learning algorithms can enable robots to acquire common sense, allowing them to learn a variety of complex sensory motor skills in real-world settings. She completed her bachelor’s degree at MIT and has also spent time at Google Brain.

Richard and Chelsea, thank you so much for being here.

Chelsea: Happy to be here.

Richard: As am I.

Ariel: Now normally I spend time putting together questions for the guests, but today Richard and Chelsea chose the topics. Many of the breakthroughs they’re excited about were more about behind-the-scenes technical advances that may not have been quite as exciting for the general media. However, there was one exception to that, and that’s AlphaZero.

AlphaZero, which was DeepMind’s follow-up to AlphaGo, made a big splash with the popular press in December when it achieved superhuman skills at Chess, Shogi and Go without any help from humans. So Richard and Chelsea, I’m hoping you can tell us more about what AlphaZero is, how it works and why it’s a big deal. Chelsea, why don’t we start with you?

Chelsea: Yeah, so DeepMind first started with developing AlphaGo a few years ago, and AlphaGo started its learning by watching human experts play, watching how human experts play moves, how they analyze the board — and then once it analyzed and once it started with human experts, it then started learning on its own.

What’s exciting about AlphaZero is that the system started entirely on its own without any human knowledge. It started just by what’s called “self-play,” where the agent, where the artificial player is essentially just playing against itself from the very beginning and learning completely on its own.

And I think that one of the really exciting things about this research and this result was that AlphaZero was able to outperform the original AlphaGo program, and in particular was able to outperform it by removing the human expertise, by removing the human input. And so I think that this suggests that maybe if we could move towards removing the human biases and removing the human input and move more towards what’s called unsupervised learning, where these systems are learning completely on their own, then we might be able to build better and more capable artificial intelligence systems.

Ariel: And Richard, is there anything you wanted to add?

Richard: So, what was particularly exciting about AlphaZero is that it’s able to do this by essentially a technique very similar to what Paul Christiano of AI Safety fame has called “capability amplification.” It’s similar in that it’s learning a function to predict a prior or an expectation over which moves are likely at a given point, as well as function to predict which player will win. And it’s able to do these in an iterative manner. It’s able to apply what’s called an “amplification scheme” in the more general sense. In this case it was Monte Carlo tree search, but in the more general case it could be other more appropriate amplification schemes for taking a simple function and iterating it many times to make it stronger, to essentially have a leading function that is then summarized.

Ariel: So I do have a quick follow up question here. With AlphaZero, it’s a program that’s living within a world that has very strict rules. What is the next step towards moving outside of that world with very strict rules and into the much messier real world?

Chelsea: That’s a really good point. The catch with these results, with these types of games — and even video games, which are a little bit messier than the strict rules of a board game — these games, all of these games can be perfectly simulated. You can perfectly simulate what will happen when you make a certain move or when you take a certain action, either in a video game or in the game of Go or the game of Chess, et cetera. Then therefore, you can train these systems with many, many lifetimes of data.

The real physical world on the other hand, we can’t simulate. We don’t know how to simulate the complex physics of the real world. As a result, you’re limited by the number of robots that you have if you’re interested in robots, or if you’re interested in healthcare, you’re limited by the number of patients that you have. And you’re also limited by safety concerns, the cost of failure, et cetera.

I think that we still have a long way to go towards taking these sorts of advances into real world settings where there’s a lot of noise, there’s a lot of complexity in the environment, and I think that these results are inspiring, and we can take some of the ideas from these approaches and apply them to these sorts of systems, but we need to keep in mind that there are a lot of challenges ahead of us.

Richard: So between real world systems and something like the game of Go, there are also incremental improvements, like introducing this port for partial observability or more stochastic environments, or more continuous environments as opposed to the very discrete ones. So these challenges, assuming that we do have a situation where we could actually simulate what we would like to see or use a simulation to help to get training data on the fly, then in those cases, we’re likely to be able to make some progress. Using a technique like this with some extensions or with some modifications to support those criteria.

Ariel: Okay. Now, I’m not sure if this is a natural jump to the next topic or not, but you’ve both mentioned that one of the big things that you saw happening last year were new creative approaches to unsupervised learning, and Richard in an email to me you mentioned “word translation without parallel data.” So I was hoping you could talk a little bit more about what these new creative approaches are and what you’re excited about there.

Richard: So this year, we saw an application of taking vector spaces, or taking word embeddings, which are essentially these multidimensional spaces where there are relationships between points that are meaningful semantically. The space itself is learned by a relatively shallow deep-learning network, but this meaningfulness that is imbued in the space, is actually able to be used, we’ve seen this year, by taking different languages, or I should say vector spaces that were trained in different languages or created from corpora of different languages and compared, and via some techniques to sort of compare and rationalize the differences between those spaces, we’re actually able to translate words and translate things between language pairs in ways that actually, in some cases, exceed supervised approaches because typically there are parallel sets of documents that have the same meaning in different languages. But in this case, we’re able to essentially do something very similar to what the Star Trek universal translator does. By consuming enough of the alien language, or the foreign language I should say, it’s able to model the relationships between concepts and then realign those with the concepts that are known.

Chelsea, would you like to comment on that?

Chelsea: I don’t think I have too much to add. I’m also excited about the translation results and I’ve also seen similar, I guess, works that are looking at unsupervised learning, not for translation, that have a little bit of a similar vein, but they’re fairly technical in terms of the actual approach.

Ariel: Yeah, I’m wondering if either of you want to try to take a stab at explaining how this works without mentioning vector spaces?

Richard: That’s difficult because it is a space, I mean it’s a very geometric concept, and it’s because we’re aligning shapes within that space that we actually get the magic happening.

Ariel: So would it be something like you have different languages going in, some sort of document or various documents from different languages going in, and this program just sort of maps them into this space so that it figures out which words are parallel to each other then?

Richard: Well it figures out the relationship between words and based on the shape of relationships in the world, it’s able to take those shapes and rotate them into a way that sort of matches up.

Chelsea: Yeah, perhaps it could be helpful to give an example. I think that generally in language you’re trying to get across concepts, and there is structure within the language, I mean there’s the structure that you learn about in grade school when you’re learning vocabulary. You learn about verbs, you learn about nouns, you learn about people and you learn about different words that describe these different things, and different languages have shared this sort of structure in terms of what they’re trying to communicate.

And so, what these algorithms do is they are given basically data of people talking in English, or people writing documents in English, and they’re also given data in another language — and the first one doesn’t necessarily need to be English. They’re given data in one language and data in another language. This data doesn’t match up. It’s not like one document that’s been translated into another, it’s just pieces of language, documents, conversations, et cetera, and by using the structure that exists, and the data such as nouns, verbs, animals, people, it can basically figure out how to map from the structure of one language to the structure of another language. It can recognize this similar structure in both languages and then figure out basically a mapping from one to the other.

Ariel: Okay. So I think, I want to keep moving forward, but continuing with the concept of learning, and Chelsea I want to stick with you for a minute. You mentioned that there were some really big metalearning advances that occurred last year, and you also mentioned a workshop and symposium at NIPS. I was wondering if you could talk a little more about that.

Chelsea: Yeah, I think that there’s been a lot of excitement around metalearning, or learning to learn. There were two gatherings at NIPS, one symposium, one workshop this year and both were well-attended by a number of people. Actually, metalearning has a fairly long history, and so it’s by no means a recent or a new topic, but I think that it has renewed attention within the machine learning community.

And so, I guess I can describe metalearning. It’s essentially having systems that learn how to learn. There’s a number of different applications for such systems. So one of them is an application that’s often referred to as AutoML, or automatic machine learning, where these systems can essentially optimize the hyper parameters, basically figure out the best set of parameters and then run a learning algorithm with those sets of hyper parameters. Essentially kind of taking the job of the machine learning researcher that is tuning different models on different data sets. And this can basically allow people to more easily train models on a data set.

Another application of metalearning that I’m really excited about is enabling systems to reuse data and reuse experience from other tasks when trying to solve new tasks. So in machine learning, there’s this paradigm of creating everything from scratch, and as a result, if you’re training from scratch, from zero prior knowledge, then it’s going to take a lot of data. It’s going to take a lot of time to train because you’re starting from nothing. But if instead you’re starting from previous experience in a different environment or on a different task, and you can basically learn how to efficiently learn from that data, then when you see a new task that you haven’t seen before, you should be able to solve it much more efficiently.

And so, one example of this is what’s called One-Shot Learning or Few-Shot Learning, where you learn essentially how to learn from a few examples, such that when you see a new setting and you just get one or a few examples, labeled examples, labeled data points, you can figure out the new task and solve the new task just from a small number of examples.

One explicit example of how humans do this is that you can have someone point out a Segway to you on the street, and even if you’ve never seen a Segway before or never heard of the concept of a Segway, just from that one example of a human pointing out to you, you can then recognize other examples of Segways. And the way that you do that is basically by learning how to recognize objects over the course of your lifetime.

Ariel: And are there examples of programs doing this already? Or we’re just making progress towards programs being able to do this more effectively?

Chelsea: There are some examples of programs being able to do this in terms of image recognition. There’s been a number of works that have been able to do this with real images. I think that more recently we’ve started to see systems being applied to robotics, which I think is one of the more exciting applications of this setting because when you’re training a robot in the real world, you can’t have the robot collect millions of data points or days of experience in order to learn a single task. You need it to share and reuse experiences from other tasks when trying to learn a new task.

So one example of this is that you can have a robot be able to manipulate a new object that it’s never seen before based on just one demonstration of how to manipulate that object from a human.

Ariel: Okay, thanks.

I want to move to a topic that is obviously of great interest to FLI and that is technical safety advances that occurred last year. Again in an email to me, you’ve both mentioned “inverse reward design” and “deep reinforcement learning for human preferences” as two areas related to the safety issue that were advanced last year. I was hoping you could both talk a little bit about what you saw happening last year that gives you hope for developing safer AI and beneficial AI.

Richard: So, as I mentioned, both inverse reward design and deep reinforcement learning from human preferences are exciting papers that came out this year.

So inverse reward design is where the AI system is trying to understand what the original designer or what the original user intends for the system to do. So it actually tries, if it’s in some new setting, a test setting where there are some potentially problematic new things that were introduced relative to the training time, then it tries specifically to back those out or to mitigate the effects of those, so that’s kind of exciting.

Deep reinforcement learning from human preferences is an algorithm for trying to very efficiently get feedback from humans based on trajectories in the context of reinforcement learning systems. So, these are systems that are trying to learn some way to plan, let’s say a path through a game environment or in general trying to learn a policy of what to do in a given scenario. This algorithm, deep RL from human preferences, shows little snippets of potential paths to humans and has them simply choose which are better, very similar to what goes on at an optometrist. Does A look better or does B look better? And just from that, very sophisticated behaviors can be learned from human preferences in a way that was not possible before in terms of scale.

Ariel: Chelsea, is there anything that you wanted to add?

Chelsea: Yeah. So, in general, I guess, going back to AlphaZero and going back to games in general, there’s a very clear objective for achieving the goal, which is whether or not you won the game or your score at the game. It’s very clear what the objective is and what each system should be optimizing for. AlphaZero should be, like when playing Go should be optimizing for winning the game, and if a system is playing Atari games it should be optimizing for maximizing the score.

But in the real world, when you’re training systems, when you’re training agents to do things, when you’re training an AI to have a conversation with you, when you’re training a robot to set the table for you, there is no score function. The real world doesn’t just give you a score function, doesn’t tell you whether or not you’re winning or losing. And I think that this research is exciting and really important because it gives us another mechanism for telling robots, telling these AI systems how to do the tasks that we want them to do.

And for example, the human preferences work, it allows us, in sort of specifying some sort of goal that we want the robot to achieve or kind of giving it a demonstration of what we want the robot to achieve, or some sort of reward function, instead lets us say, “okay, this is not what I want, this is what I want,” throughout the process of learning. And then as a result, at the end you can basically guarantee that if it was able to optimize for your preferences successfully, then you’ll end up with behavior that you’re happy with.

Ariel: Excellent. So I’m sort of curious, before we started recording, Chelsea, you were telling me a little bit about your own research. Are you doing anything with this type of work? Or is your work a little different?

Chelsea: Yeah. So more recently I’ve been working on metalearning and so some of the metalearning works that I talked about previously, like learning just from a single demonstration and reusing data, reusing experience that you talked about previously, has been some of the things that I’ve been focusing on recently in terms of getting robots to be able to do things in the real world, such as manipulating objects, pushing objects around, using a spatula, stuff like that.

I’ve also done work on reinforcement learning where you essentially give a robot an objective, tell it to try to get the object as close as possible to the goal, and I think that the human preferences work provides a nice alternative to the classic setting, to the classic framework of reinforcement learning, that we could potentially apply to real robotic systems.

Ariel: Chelsea, I’m going to stick with you for one more question. In your list of breakthroughs that you’re excited about, one of the things that you mentioned is very near and dear to my heart, and that was better communication, and specifically better communication of the research. And I was hoping you could talk a little bit about some of the websites and methods of communicating that you saw develop and grow last year.

Chelsea: Yes. I think that more and more we’re seeing researchers put their work out in blog posts and try to make their work more accessible to the average user by explaining it in terms that are easier to understand, by motivating it in words that are easier for the average person to understand and I think that this is a great way to communicate the research in a clear way to a broader audience.

In addition, I’ve been quite excited about an effort, I think led by Chris Olah, on building what is called distill.pub. It’s a website and a journal, an academic journal, that tries to move away from this paradigm of publishing research on paper, on trees essentially. Because we have such rich digital technology that allows us to communicate in many different ways, it makes sense to move past just completely written forms of research dissemination. And I think that’s what distill.pub does, is it allows us, allows researchers to communicate research ideas in the form of animations, in the form of interactive demonstrations on a computer screen, and I think this is a big step forward and has a lot of potential in terms of moving forward the communication of research, the dissemination of research among the research community as well as beyond to people that are less familiar with the technical concepts in the field.

Ariel: That sounds awesome, Chelsea, thank you. And distill.pub is probably pretty straight forward, but we’ll still link to it on the post that goes along with this podcast if anyone wants to click straight through.

And Richard, I want to switch back over to you. You mentioned that there was more impressive output from GANs last year, generative adversarial networks.

Richard: Yes.

Ariel: Can you tell us what a generative adversarial network is?

Richard: So a generative adversarial network is an AI system where there are two parts, essentially a generator or creator that comes up with novel artifacts and a critic that tries to determine whether this is a good or legitimate or realistic type of thing that’s being generated. So both are learned in parallel as training data is streamed into the system, so in this way, the generator learns relatively efficiently how to create things that are good or realistic.

Ariel: So I was hoping you could talk a little bit about what you saw there that was exciting.

Richard: Sure, so new architectures and new algorithms and simply more horsepower as well have led to more impressive output. Particularly exciting are conditional generative adversarial networks, where there can be structured biases or new types of inputs that one wants to base some output around.

Chelsea: Yeah, I mean, one thing to potentially add is that I think the research on GANs is really exciting and I think that it will not only make advances in generating images of realistic quality, but also generating other types of things, like generating behavior potentially, or generating speech, or generating a language. We haven’t seen as much advances in those areas as generating images, thus far the most impressive advances have been in generating images. I think that those are areas to watch out for as well.

One thing to be concerned about in terms of GANs is the ability for people to generate fake images, fake videos of different events happening and putting those fake images and fake videos into the media, because while there might be ways to detect whether or not these images are made-up or are counterfeited essentially, the public might choose to believe something that they see. If you see something, you’re very likely to believe it, and this might exacerbate all of the, I guess, fake news issues that we’ve had recently.

Ariel: Yeah, so that actually brings up something that I did want to get into, and honestly, that, Chelsea, what you just talked about, is some of the scariest stuff I’ve seen, just because it seems like it has the potential to create sort of a domino effect of triggering all of these other problems just with one fake video. So I’m curious, how do we address something like that? Can we? And are there other issues that you’ve seen crop in the last year that also have you concerned?

Chelsea: I think there are potentially ways to address the problem in that if media websites, if it seems like it’s becoming a real danger in the imminent future, then I think that media websites, including social media websites, should take measures to try to be able to detect fake images and fake videos and either prevent them from being displayed or put a warning that it seems like it was detected as something that was fake, to explicitly try to mitigate the effects.

But, that said, I haven’t put that much thought into it. I do think it’s something that we should be concerned about, and the potential solution that I mentioned, I think that even if it can help solve some of the problems, I think that we don’t have a solution to the problem yet.

Ariel: Okay, thank you. I want to move on to the last question that I have that you both brought up, and that was, last year we saw an increased discussion of fairness in machine learning. And Chelsea, you mentioned there was a NIPS tutorial on this and the keynote mentioned it at NIPS as well. So I was hoping you could talk a bit about what that means, what we saw happen, and how you hope this will play out to better programs in the future.

Chelsea: So, there’s been a lot of discussion in how we can build machine-learning systems, build AI systems such that when they make decisions, they are fair and they aren’t biased. And all this discussion has been around fairness in machine learning, and actually one of the interesting things about the discussion from a technical point of view is how you even define fairness and how you define removing biases and such, because a lot of the biases are inherent to the data itself. And how you try to remove those biases can be a bit controversial.

Ariel: Can you give us some examples?

Chelsea: So one example is, if you’re trying to build an autonomous car system that is trying to avoid hitting pedestrians, and recognize pedestrians when appropriate and respond to them, then if these systems are trained in environments and in communities that are predominantly of one race, for example in Caucasian communities, and you then deploy this system in settings where there are people of color and in other environments that it hasn’t seen before, then the resulting system won’t have as good accuracy on settings that it hasn’t seen before and will be biased inherently, when it for example tries to recognize people of color, and this is a problem.

So some other examples of this is if machine learning systems are making decisions about who to give health insurance to, or speech recognition systems that are trying to recognize different speeches, if these systems are trained on a smaller part of the community that is not representative of the entire population as a whole, then they won’t be able to accurately make decisions about the entire population. Or if they’re trained on data that was collected by humans that has the same biases as humans, then they will make the same mistake, they will inherit the same biases that humans inherit, that humans have.

I think that the people that have been researching fairness in machine learning systems, unfortunately one of the conclusions that they’ve made so far is that there isn’t just a one size fits all solution to all of these different problems, and in many cases we’ll have to think about fairness in individual contexts.

Richard: Chelsea, you mentioned that some of the remediations for fairness issues in machine learning are themselves controversial. Can you go into an example or so about that?

Chelsea: Yeah, I guess part of what I meant there is that even coming up with a definition for what is fair is unclear. It’s unclear what even the problem specification is, and without a problem specification, without a definition of what you want your system to be doing, creating a system that’s fair is a challenge if you don’t have a definition for what fair is.

Richard: I see.

Ariel: So then, my last question to you both, as we look towards 2018, what are you most excited or hopeful to see?

Richard: I’m very hopeful for the FLI grants program that we announced at the very end of 2017 leading to some very interesting and helpful AI safety papers and AI safety research in general that will build on past research and break new ground and will enable additional future research to be built on top of it to make the prospect of general intelligence safer and something that we don’t need to fear as much. But that is a hope.

Ariel: And Chelsea, what about you?

Chelsea: I think I’m excited to see where metalearning goes. I think that there’s a lot more people that are paying attention to it and starting to research into “learning to learn” topics. I’m also excited to see more advances in machine learning for robotics. I think that, unlike other fields in machine learning like machine translation, image recognition, et cetera, I think that robotics still has a long way to go in terms of being useful and solving a range of complex tasks and I hope that we can continue to make strides in machine learning for robotics in the coming year and beyond.

Ariel: Excellent. Well, thank you both so much for joining me today.

Richard: Sure, thank you.

Chelsea: Yeah, I enjoyed talking to you.

 

This podcast was edited by Tucker Davey.

Rewinding the Doomsday Clock

On Thursday, the Bulletin of Atomic Scientists inched their iconic Doomsday Clock forward another thirty seconds. It is now two minutes to midnight.

Citing the growing threats of climate change, increasing tensions between nuclear-armed countries, and a general loss of trust in government institutions, the Bulletin warned that we are “making the world security situation more dangerous than it was a year ago—and as dangerous as it has been since World War II.”

The Doomsday Clock hasn’t fallen this close to midnight since 1953, a year after the US and Russia tested the hydrogen bomb, a bomb up to 1000 times more powerful than the bombs dropped on Hiroshima and Nagasaki. And like 1953, this year’s announcement highlighted the increased global tensions around nuclear weapons.

As the Bulletin wrote in their statement, “To call the world nuclear situation dire is to understate the danger—and its immediacy.”

Between the US, Russia, North Korea, and Iran, the threats of aggravated nuclear war and accidental nuclear war both grew in 2017. As former Secretary of Defense William Perry said in a statement, “The events of the past year have only increased my concern that the danger of a nuclear catastrophe is increasingly real. We are failing to learn from the lessons of history as we find ourselves blundering headfirst towards a second cold war.”

The threat of nuclear war has hovered in the background since the weapons were invented, but with the end of the Cold War, many were pulled into what now appears to have been a false sense of security. In the last year, aggressive language and plans for new and upgraded nuclear weapons have reignited fears of nuclear armageddon. The recent false missile alerts in Hawaii and Japan were perhaps the starkest reminders of how close nuclear war feels, and how destructive it would be. 

 

But the nuclear threat isn’t all the Bulletin looks at. 2017 also saw the growing risk of climate change, a breakdown of trust in government institutions, and the emergence of new technological threats.

Climate change won’t hit humanity as immediately as nuclear war, but with each year that the international community fails to drastically reduce carbon fossil fuel emissions, the threat of catastrophic climate change grows. In 2017, the US pulled out of the Paris Climate Agreement and global carbon emissions grew 2% after a two-year plateau. Meanwhile, NASA and NOAA confirmed that the past four years are the hottest four years they’ve ever recorded.

For emerging technological risks, such as widespread cyber attacks, the development of autonomous weaponry, and potential misuse of synthetic biology, the Bulletin calls for the international community to work together. They write, “world leaders also need to seek better collective methods of managing those advances, so the positive aspects of new technologies are encouraged and malign uses discovered and countered.”

Pointing to disinformation campaigns and “fake news”, the Bulletin’s Science and Security Board writes that they are “deeply concerned about the loss of public trust in political institutions, in the media, in science, and in facts themselves—a loss that the abuse of information technology has fostered.”

 

Turning Back the Clock

The Doomsday Clock is a poignant symbol of the threats facing human civilization, and it received broad media attention this week through British outlets like The Guardian and The Independent, Australian outlets such as ABC Online, and American outlets from Fox News to The New York Times.

“[The clock] is a tool,” explains Lawrence Krauss, a theoretical physicist at Arizona State University and member of the Bulletin’s Science and Security Board. “For one day a year, there are thousands of newspaper stories about the deep, existential threats that humanity faces.”

The Bulletin ends its report with a list of priorities to help turn back the Clock, chocked full of suggestions for government and industrial leaders. But the authors also insist that individual citizens have a crucial role in tackling humanity’s greatest risks.

“Leaders react when citizens insist they do so,” the authors explain. “Citizens around the world can use the power of the internet to improve the long-term prospects of their children and grandchildren. They can insist on facts, and discount nonsense. They can demand action to reduce the existential threat of nuclear war and unchecked climate change. They can seize the opportunity to make a safer and saner world.”

You can read the Bulletin’s full report here.

Podcast: Beneficial AI and Existential Hope in 2018

For most of us, 2017 has been a roller coaster, from increased nuclear threats to incredible advancements in AI to crazy news cycles. But while it’s easy to be discouraged by various news stories, we at FLI find ourselves hopeful that we can still create a bright future. In this episode, the FLI team discusses the past year and the momentum we’ve built, including: the Asilomar Principles, our 2018 AI safety grants competition, the recent Long Beach workshop on Value Alignment, and how we’ve honored one of civilization’s greatest heroes.

Full transcript:

Ariel: I’m Ariel Conn with the Future of Life Institute. As you may have noticed, 2017 was quite the dramatic year. In fact, without me even mentioning anything specific, I’m willing to bet that you already have some examples forming in your mind of what a crazy year this was. But while it’s easy to be discouraged by various news stories, we at FLI find ourselves hopeful that we can still create a bright future. But I’ll let Max Tegmark, president of FLI, tell you a little more about that.

Max: I think it’s important when we reflect back at the years news to understand how things are all connected. For example, the drama we’ve been following with Kim Jung Un and Donald Trump and Putin with nuclear weapons, is really very connected to all the developments in artificial intelligence because in both cases we have a technology which is so powerful that it’s not clear that we humans have sufficient wisdom to manage it well. And that’s why I think it’s so important that we all continue working towards developing this wisdom further, to make sure that we can use these powerful technologies like nuclear energy, like artificial intelligence, like biotechnology and so on to really help rather than to harm us.

Ariel: And it’s worth remembering that part of what made this such a dramatic year was that there were also some really positive things that happened. For example, in March of this year, I sat in a sweltering room in New York City, as a group of dedicated, caring individuals from around the world discussed how they planned to convince the United Nations to ban nuclear weapons once and for all. I don’t think anyone in the room that day realized that not only would they succeed, but by December of this year, the International Campaign to Abolish Nuclear Weapons, led by Beatrice Fihn would be awarded the Nobel Peace Prize for their efforts. And while we did what we could to help that effort, our own big story had to be the Beneficial AI Conference that we hosted in Asilomar California. Many of us at FLI were excited to talk about Asilomar, but I’ll let Anthony Aguirre, Max, and Victoria Krakovna start.

Anthony: I would say pretty unquestionably the big thing that I felt was most important and felt most excited about was the big meeting in Asilomar and centrally putting together the Asilomar Principles.

Max: I’m going to select the Asilomar conference that we organized early this year, whose output was the 23 Asilomar Principles, which has since been signed by over a thousand AI researchers around the world.

Vika: I was really excited about the Asilomar conference that we organized this year. This was the sequel to FLI’s Puerto Rico Conference, which was at the time a real game changer in terms of making AI safety more mainstream and connecting people working in AI safety with the machine learning community and integrating those two. I think Asilomar did a great job of continuing to build on that.

Max: I’m very excited about this because I feel that it really has helped mainstream AI safety work. Not just near term AI safety stuff, like how to transform today’s buggy and hackable computers into robust systems that you can really trust but also mainstream larger issues. The Asilomar Principles actually contain the word super intelligence, contain the phrase existential risk, contain the phrase recursive self improvement and yet they have been signed by really a who’s who in AI. So it’s from now on, it’s impossible for anyone to dismiss these kind of concerns, this kind of safety research. By saying, that’s just people who have no clue about AI.

Anthony: That was a process that started in 2016, brainstorming at FLI and then the wider community and then getting rounds of feedback and so on. But it was exciting both to see how much cohesion there was in the community and how much support there was for getting behind some sort of principles governing AI. But also, just to see the process unfold because one of the things that I’m quite frustrated about often is this sense that there’s this technology that’s just unrolling like a steam roller and it’s going to go where it’s going to go, and we don’t have any agency over where that is. And so to see people really putting thought into what is the world we would like there to be in ten, fifteen, twenty, fifty years and how can we distill what it is that we like about that world into principles like these…that felt really, really good. It felt like an incredibly useful thing for society as a whole but in this case, the people who are deeply engaged with AI, to be thinking through in a real way rather than just how can we put out the next fire, or how can we just turn the progress one more step forward, to really think about the destination.

Ariel: But what’s that next step? How do we transition from Principles that we all agree on to actions that we can also all get behind. Jessica Cussins joined FLI later in the year, but when asked what she was excited about as far as FLI was concerned, she immediately mentioned the implementation of things like the Asilomar Principles.

Jessica: I’m most excited about the developments we’ve seen over the last year related to safe, beneficial and ethical AI. I think FLI has been a really important player in this. We had the beneficial AI conference in January that resulted in the Asilomar AI Principles. It’s been really amazing to see how much traction those principles have gotten and to see a growing consensus around the importance of being thoughtful about the design of AI systems, the challenges of algorithmic bias of data control and manipulation and accountability and governance. So the thing I’m most excited about right now, is the growing number of initiatives we’re seeing around the world related to ethical and beneficial IA.

Anthony: What’s been great to see is the development of ideas both from FLI and from many other organizations of what policies might be good. What concrete legislative actions there might be or standards, organizations or non-profits, agreements between companies and so on might be interesting.

But I think, we’re only at the step of formulating those things and not that much action has been taken anywhere in terms of actually doing those things. Little bits of legislation here and there. But I think we’re getting to the point where lots of governments, lots of companies, lots of organizations are going to be publishing and creating and passing more and more of these things. I think seeing that play out and working really hard to ensure that it plays out in a way that’s favorable in as many ways and as many people as possible, I think is super important and something we’re excited to do.

Vika: I think that Asilomar principles are a great common point for the research community and others to agree what we are going for, what’s important.

Besides having the principles as an output, the event itself was really good for building connections between different people from interdisciplinary backgrounds, from different related fields who are interested in the questions of safety and ethics.

And we also had this workshop that was adjacent to Asilomar where our grant winners actually presented their work. I think it was great to have a concrete discussion of research and the progress we’ve made so far and not just abstract discussions of the future, and I hope that we can have more such technical events, discussing research progress and making the discussion of AI safety really concrete as time goes on.

Ariel: And what is the current state of AI safety research? Richard Mallah took on the task of answering that question for the Asilomar conference, while Tucker Davey has spent the last year interviewing various FLI grant winners to better understand their work.

Richard: I presented a landscape of technical AI safety research threads. This lays out hundreds of different types of research areas and how they are related to each other. All different areas that need a lot more research going into them than they have today to help keep AI safe and beneficent and robust. I was really excited to be at Asilomar and to have co-organized Asilomar and that so many really awesome people were there and collaborating on these different types of issues. And that they were using that landscape that I put together as sort of a touchpoint and way to coordinate. That was pretty exciting.

Tucker: I just found it really inspiring interviewing all of our AI grant recipients. It’s kind of been an ongoing project interviewing these researchers and writing about what they’re doing. Just for me, getting recently involved in AI, it’s been incredibly interesting to get either a half an hour, an hour with these researchers to talk in depth about their work and really to learn more about a research landscape that I hadn’t been aware of before working at FLI. Really, being a part of those interviews and learning more about the people we’re working with and these people that are really spearheading AI safety was really inspiring to be a part of.

Ariel: And with that, we have a big announcement.

Richard: So, FLI is launching a new grants program in 2018. This time around, we will be focusing more on artificial general intelligence, artificial super intelligence and ways that we can do technical research and other kinds of research today. On today’s systems or things that we can analyze today, things that we can model or make theoretical progress on today that are likely to actually still be relevant at the time, where AGI comes about. This is quite exciting and I’m excited to be part of the ideation and administration around that.

Max: I’m particularly excited about the new grants program that we’re launching for AI safety research. Since AI safety research itself has become so much more mainstream, since we did our last grants program three years ago, there’s now quite a bit of funding for a number of near term challenges. And I feel that we at FLI should focus on things more related to challenges and opportunities from super intelligence, since there is virtually no funding for that kind of safety research. It’s going to be really exciting to see what proposals come in and what research teams get selected by the review panels. Above all, how this kind of research hopefully will contribute to making sure that we can use this powerful technology to create a really awesome future.

Vika: I think this grant program could really build on the impact of our previous grant program. I’m really excited that it’s going to focus more on long term AI safety research, which is still the most neglected area.

AI safety has really caught on in the past two years, and there’s been a lot more work on that going on, which is great. And part of what this means is that the we at FLI can focus more on the long term. The long term work has also been getting more attention, and this grant program can help us build on that and make sure that the important problems get solved. This is really exciting.

Max: I just came back from spending a week at the NIPS Conference, the biggest artificial intelligence conference of the year. Its fascinating how rapidly everything is proceeding. AlphaZero has now defeated not just human chess players and Go players but it has also defeated human AI researchers, who after spending 30 years handcrafting artificial intelligence software to play computer chess, got all their work completely crushed by AlphaZero that just learned to do much better than that from scratch in four hours.

So, AI is really happening, whether we like it or not. The challenge we face is simply to compliment that through AI safety research and a lot of good thinking to make sure that this helps humanity flourish rather than flounder.

Ariel: In the spirit of flourishing, FLI also turned its attention this year to the movement to ban lethal autonomous weapons. While there is great debate around how to define autonomous weapons and whether or not they should be developed, more people tend to agree that the topic should at least come before the UN for negotiations. And so we helped create the video Slaughterbots to help drive this conversation. I’ll let Max take it from here.

Max: Slaughterbots, autonomous little drones that can go anonymously murder people without any human control. Fortunately, they don’t exist yet. We hope that an international treaty is going to keep it that way, even though we almost have the technology to do them already. Just need to integrate then mass produce tech we already have. So to help with this, we made this video called Slaughterbots. It was really impressive to see it get over forty million views and make the news throughout the world. I was very happy that Stewart Russell, whom we partnered with in this, also presented this to the diplomats at the United Nations in Geneva when they were discussing whether to move towards a treaty, drawing a line in the sand.

Anthony: Pushing on the autonomous weapons front, it’s been really scary, I would say to think through that issue. But a little bit like the issue of AI, in general, there’s a potential scary side but there’s also a potentially helpful side in that I think this is an issue that is a little bit tractable. Even a relatively small group of committed individuals can make difference. So I think, I’m excited to see how much movement we can get on the autonomous weapons front. It doesn’t seem at all like a hopeless issue to me and I think 2018 will be kind of a turning point — I hope that will be sort of a turning point for that issue. It’s kind of flown under the radar but it really is coming up now and it will be at least interesting. Hopefully, it will be exciting and happy and so on as well as interesting. It will at least be interesting to see how it plays out on the world stage.

Jessica: For 2018, I’m hopeful that we will see the continued growth of the global momentum against lethal autonomous weapons. Already, this year a lot has happened at the United Nations and across communities around the world, including thousands of AI and robotics researchers speaking out and saying they don’t want to see their work used to create these kinds of destabilizing weapons of mass destruction. One thing I’m really excited for 2018 is to see a louder, rallying call for an international ban of lethal autonomous weapons.

Ariel: Yet one of the biggest questions we face when trying to anticipate autonomous weapons and artificial intelligence in general, and even artificial general intelligence – one of the biggest questions is: when? When will these technologies be developed? If we could answer that, then solving problems around those technologies could become both more doable and possibly more pressing. This is an issue Anthony has been considering.

Anthony: Of most interest has been the overall set of projects to predict artificial intelligence timelines and milestones. This is something that I’ve been doing through this prediction website, Metaculus, which I’ve been a part of. And also something where I’ve took part in a very small workshop run by the Foresight Institute over the summer. It’s both a super important question because I think the overall urgency with which we have to deal with certain issues really depends on how far away they are. It’s also an instructive one, in that even posing the questions of what do we want to know exactly, really forces you to think through what is it that you care about, how would you estimate things, what different considerations are there in terms of this sort of big question.

We have this sort of big question, like when is really powerful AI going to appear? But when you dig into that, what exactly is really powerful, what exactly…  What does appear mean? Does that mean in sort of an academic setting? Does it mean becomes part of everybody’s life?

So there are all kinds of nuances to that overall big question that lots of people asking. Just getting into refining the questions, trying to pin down what it is that mean — make them exact so that they can be things that people can make precise and numerical predictions about. I think its been really, really interesting and elucidating to me and in sort of understanding what all the issues are. I’m excited to see how that kind of continues to unfold as we get more questions and more predictions and more expertise focused on that. Also, a little but nervous because the timeline seemed to be getting shorter and shorter and the urgency of the issue seems to be getting greater and greater. So that’s a bit of a fire under us, I think, to keep acting and keep a lot of intense effort on making sure that as AI gets more powerful, we get better at managing it.

Ariel: One of the current questions AI researchers are struggling with is the problem of value alignment, especially when considering more powerful AI. Meia Chita-Tegmark and Lucas Perry recently co-organized an event to get more people thinking creatively about how to address this.

Meia: So we just organized a workshop about the ethics of value alignment together with a few partner organizations, the Berggruen Institute and also CFAR.

Lucas: This was a workshop recently that took place in California and just to remind everyone, value alignment is the process by which we bring AI’s actions, goals, and intention in alignment with and in accordance with what is deemed to be the good or what are human values and preferences and goals and intentions.

Meia: And we had a fantastic group of thinkers there. We had philosophers. We had social scientists, AI researchers, political scientists. We were all discussing this very important issue of how do we get an artificial intelligence that is aligned to our own goals and our own values.

It was really important to have the perspectives of ethicists and moral psychologists, for example, because this question is not just about the technical aspect of how do you actually implement it, but also about whose values do we want implemented and who should be part of the conversation and who gets excluded and what process do we want to establish to collect all the preferences and values that we want implemented in AI. That was really fantastic. It was a very nice start to what I hope will continue to be a really fruitful collaboration between different disciplines on this very important topic.

Lucas: I think one essential take-away from that was that value alignment is truly something that is interdisciplinary. It’s normally been something which has been couched and understood in the context of technical AI safety research, but value alignment, at least in my view, also inherently includes ethics and governance. It seems that the project of creating beneficial AI through efforts and value alignment can really only happen when we have lots of different people from lots of different disciplines working together on this supremely hard issue.

Meia: I think the issue with AI is something that … first of all, it concerns such a great number of people. It concerns all of us. It will impact, and it already is impacting all of our experiences. There’re different disciplines that look at this impact from different ways.

Of course, technical AI researchers will focus on developing this technology, but it’s very important to think about how does this technology co-evolve with us. For example, I’m a psychologist. I like to think about how does it impact our own psyche. How does it impact the way we act in the world, the way we behave. Stuart Russell many times likes to point out that one danger that can come with very intelligent machines is a subtle one, not necessarily what they will do, but what we will not do because of them. He calls this enfeeblement. What are the capacities that are being stifled because we no longer engage in some of the cognitive tasks that we’re now delegating to AIs.

So that’s just one example of how, for example, psychologists can help really bring more light and make us reflect on what is it that we want from our machines and how do we want to interact with them and how do we wanna design them such that they actually empower us rather than enfeeble us.

Lucas: Yeah, I think that one essential thing to FLI’s mission and goal is the generation of beneficial AI. To me, and I think many other people coming out of this Ethics of Value Alignment conference, you know, what beneficial exactly entails and what beneficial looks like is still a really open question both in the short term and in the long-term. I’d be really interested in seeing both FLI and other organizations pursue questions in value alignment more vigorously. Issues with regard to the ethics of AI and issues regarding value and the sort of world that we want to live in.

Ariel: And what sort of world do we want to live in? If you’ve made it this far through the podcast, you might be tempted to think that all we worry about is AI. And we do think a lot about AI. But our primary goal is to help society flourish. And so this year, we created the Future of Life Award to be presented to people who act heroically to ensure our survival and hopefully move us closer to that ideal world. Our inaugural award was presented in honor of Vasili Arkhipov who stood up to his commander on a Soviet submarine, and prevented the launch of a nuclear weapon during the height of tensions in the Cold War.

Tucker: One thing that particularly stuck out to me was our inaugural Future of Life Award and we presented this award to Vasili Arkhipov who was a Soviet officer in the Cold War and arguably saved the world and is the reason we’re all alive today. He’s now passed, but FLI presented a generous award to his daughter and his grandson. It was really cool to be a part of this because it seemed like the first award of its kind.

Meia: So, of course with FLI, we have all these big projects that take a lot of time. But I think for me, one of the more exciting and heartwarming and wonderful moments that I was able to experience due to our work here at FLI was a train ride from London to Cambridge with Elena and Sergei, the daughter and the grandson of Vasili Arkhipov. Vasili Arkhipov is this Russian naval officer that helped prevent a second world war in the Cuban missile crisis. The Future of Life Institute awarded him the Future of Life prize this year. He is now dead unfortunately, but his daughter and his grandson was there in London to receive it.

Vika: It was great to get to meet them in person and to all go on stage together and have them talk about their attitude towards the dilemma that Vasili Arkhipov has faced, and how it is relevant today, and how we should be really careful with nuclear weapons and protecting our future. It was really inspiring.

At that event, Max was giving his talk about his book, and then at the end we had the Arkhipovs come up on stage and it was kind of fun for me to translate their speech to the audience. I could not fully transmit all the eloquence, but thought it was a very special moment.

Meia: It was just so amazing to really listen to their stories about the father, the grandfather, and look at photos that they had brought all the way from Moscow. This person who has become the hero for so many people that are really concerned about this essential risk, it was nice to really imagine him in his capacity as a son, as a grandfather, as a husband, as a human being. It was very inspiring and touching.

One of the nice things was they showed a photo of him that had actually notes that he had written on the back of it. That was his favorite photo. And one of the comments he made is that he felt that that was the most beautiful photo of himself because there was no glint in his eyes. It was just this pure sort of concentration. I thought that said a lot about his character. He rarely smiled in photos, also. Also always looked very pensive. Very much like you’d imagine a hero who saved the world would be.

Tucker: It was especially interesting for me to work on the press release for this award and to reach out to people from different news outlets, like The Guardian and The Atlantic, and to actually see them write about this award.

I think something like the Future of Life Award is inspiring because it highlights people in the past that have done an incredible service to civilization, but I also think it’s interesting to look forward and think about who might be the future Vasili Arkhipov that saves the world.

Ariel: As Tucker just mentioned, this award was covered by news outlets like the Guardian and the Atlantic. And in fact, we’ve been incredibly fortunate to have many of our events covered by major news. However, there are even more projects we’ve worked on that we think are just as important and that we’re just as excited about that most people probably aren’t aware of.

Jessica: So people may not know that FLI recently joined the partnership on AI. This was the group that was founded by Google and Amazon, Facebook and Apple and others to think about issues like safety, and fairness and impact from AI systems. So I’m excited about this because I think it’s really great to see this kind of social commitment from industry, and it’s going to be critical to have the support and engagement from these players to really see AI being developed in a way that’s positive for everyone. So I’m really happy that FLI is now one of the partners of what will likely be an important initiative for AI.

Anthony: I attending the first meeting of the partnership on AI in October. And to see, at that meeting, so much discussion of some of the principles themselves directly but just in a broad sense. So much discussion from all of the key organizations that are engaged with AI, that almost all of whom had representation there, about how are we going to make these things happen. If we value transparency, if we value fairness, if we value safety and trust in AI systems, how are we going to actually get together and formulate best practices and policies, and groups and data sets and things to make all that happen. And to see the speed at which, I would say the field has moved from purely, wow, we can do this, to how are we going to do this right and how are we going to do this well and what does this all mean, has been a ray of hope I would say.

AI is moving so fast but it was good to see that I think the sort of wisdom race hasn’t been conceded entirely. That there are dedicated group of people that are working really hard to figure out how to do it well.

Ariel: And then there’s Dave Stanley, who has been the force around many of the behind-the-scenes projects that our volunteers have been working on that have helped FLI grow this year.

Dave: As for another project that has very much been ongoing and more relates to the website is basically our ongoing effort to make the English content on the website that’s been fairly influential in English speaking countries about AI safety and nuclear weapons, take that content and make it available in a lot of other languages to maximize the impact that it’s having.

Right now, thanks to the efforts of our volunteers, we have 55 translations available on our website right now in nine different languages, which are Russian, Chinese, French, Polish, Spanish, German, Hindi, Japanese, and Korean. All in all, this represents about 1000 hours of volunteer time put in by our volunteers. I’d just like to give a shoutout to some of the volunteers who have been involved. They are Alan Yan, Kevin Wang, Kazue Evans, Jake Beebe, Jason Orlosky, Li Na, Bena Lim, Alina Kovtun, Ben Peterson, Carolyn Wu, Zhaoran Joanna Wang, Mayumi Nakamura, Derek Su, Dipti Pandey, Marvin, Vera Koroleva, Grzegorz Orwiński, Szymon Radziszewicz, Natalia Berezovskaya, Vladimir Nimensky, Natalia Kuzmenko, George Godula, Eric Gastfriend, Olivier Grondin, Claire Park, Kristy Wen, Yishuai Du, and Revathi Vinoth Kumar.

Ariel: As we’ve worked to establish AI safety as a global effort, Dave and the volunteers were behind the trip Richard took to China, where he participated in the Global Mobile Internet Conference in Beijing earlier this year.

Dave: So basically, this was something that was actually prompted and largely organized by one of FLIs volunteers, George Godula, who’s based in Shanghai right now.

Basically, this is partially motivated by the fact that recently, China’s been promoting a lot of investment in artificial intelligence research, and they’ve made it a national objective to become a leader in AI research by 2025. So FLI and the team have been making some efforts to basically try to build connections with China and raise awareness about AI safety, at least our view on AI safety and engage in dialogue there.

It’s culminated with George organizing this trip for Richard, and A large portion of the FLI volunteer team participating in basically support for that trip. So identifying contacts for Richard to connect with over there and researching the landscape and providing general support for that. And then that’s been coupled with an effort to take some of the existing articles that FLI has on their website about AI safety and translate those to Chinese to make it accessible to that audience.

Ariel: In fact, Richard has spoken at many conferences, workshops and other events this year, and he’s noted a distinct shift in how AI researchers view AI safety.

Richard: This is a single example of many of these things I’ve done throughout the year. Yesterday I gave a talk to a bunch of machine learning and artificial intelligence researchers and entrepreneurs in Boston, here where I’m based about AI safety and beneficence. Every time I do this it’s really fulfilling that so many of these people who really are pushing the leading edge of what AI does in many respects. They realize that these are extremely valid concerns and there are new types of technical avenues to help just keep things better for the future. The facts that I’m not receiving push back anymore as compared to many years ago when I would talk about these things — that people really are trying to gauge and understand and kind of weave themselves into whatever is going to turn into the best outcome for humanity. Given the type of leverage that advanced AI will bring us. I think people are starting to really get what’s at stake.

Ariel: And this isn’t just the case among AI researchers. Throughout the year, we’ve seen this discussion about AI safety broaden into various groups outside of traditional AI circles, and we’re hopeful this trend will continue in 2018.

Meia: I think that 2017 has been fantastic to start this project of getting more thinkers from different disciplines to really engage with the topic of artificial intelligence, but I think we are just manage to scratch the surface of this topic in this collaboration. So I would really like to work more on strengthening this conversation and this flow of ideas between different disciplines. I think we can achieve so much more if we can make sure that we hear each other, that we go past our own disciplinary jargon, and that we truly are able to communicate and join each other in research projects where we can bring different tools and different skills to the table.

Ariel: The landscape on AI safety research that Richard presented at Asilomar at the start of the year was designed to enable greater understanding among researchers. Lucas rounded off the year with another version of the landscape. This one looking at ethics and value alignment with the goal, in part, of bringing more experts from other fields into the conversation.

Lucas: One thing that I’m also really excited about for next year is seeing our conceptual landscapes of both AI safety and value alignment being used in more educational context and in context in which they can foster interdisciplinary conversations regarding issues in AI. I think that their virtues are that they create a conceptual landscape of both AI safety and value alignment, but also include definitions and descriptions of jargon. Given this, it functions both as a means by which you can introduce people to AI safety and value alignment and AI risk, but it also serves as a means of introducing experts to sort of the conceptual mappings of the spaces that other experts are engaged with and so they can learn each other’s jargon and really have conversations that are fruitful and sort of streamlined.

Ariel: As we look to 2018, we hope to develop more programs, work on more projects, and participate in more events that will help draw greater attention to the various issues we care about. We hope to not only spread awareness, but also to empower people to take action to ensure that humanity continues to flourish in the future.

Dave: There’s a few things that are coming up that I’m really excited about. The first one is basically we’re going to be trying to release some new interactive apps on the website that’ll hopefully be pages that can gather a lot of attention and educate people about the issues that we’re focused on, mainly nuclear weapons, and answering questions to give people a better picture of what are the geopolitical and economic factors that motivate countries to keep their nuclear weapons and how does this relate to public support, based on polling data, for whether the general public wants to keep these weapons or not.

Meia: One thing that I think has made me also very excited in 2017, and I’m looking forward to seeing the evolution of in 2018 was the public’s engagement with this topic. I’ve had the luck to be in the audience for many of the book talks that Max has given for his book “Life 3.0: Being Human in the Age of Artificial Intelligence,” and it was fascinating just listening to the questions. They’ve become so much more sophisticated and nuanced than a few years ago. I’m very curious to see how this evolves in 2018, and I hope that FLI will contribute to this conversation and making it more rich. I think I’d like people in general to get engaged with this topic much more, and refine their understanding of it.

Tucker: Well, I think in general it’s been amazing to watch FLI this year because we’ve made big splashes in so many different things with the Asilomar conference, with our Slaughterbots video, helping with the nuclear ban, but I think one thing that I’m particularly interested in is working more this coming year to I guess engage my generation more on these topics. I sometimes sense a lot of defeatism and hopelessness with people in my generation. Kind of feeling like there’s nothing we can do to solve civilization’s biggest problems. I think being at FLI has kind of given me the opposite perspective. Sometimes I’m still subject to that defeatism, but working here really gives me a sense that we can actually do a lot to solve these problems. I’d really like to just find ways to engage more people in my generation to make them feel like they actually have some sense of agency to solve a lot of our biggest challenges.

Ariel: Learn about these issues and more, join the conversation, and find out how you can get involved by visiting futureoflife.org.

[end]

 

Podcast: Balancing the Risks of Future Technologies with Andrew Maynard and Jack Stilgoe

What does it means for technology to “get it right,” and why do tech companies ignore long-term risks in their research? How can we balance near-term and long-term AI risks? And as tech companies become increasingly powerful, how can we ensure that the public has a say in determining our collective future?

To discuss how we can best prepare for societal risks, Ariel spoke with Andrew Maynard and Jack Stilgoe on this month’s podcast. Andrew directs the Risk Innovation Lab in the Arizona State University School for the Future of Innovation in Society, where his work focuses on exploring how emerging and converging technologies can be developed and used responsibly within an increasingly complex world. Jack is a senior lecturer in science and technology studies at University College London where he works on science and innovation policy with a particular interest in emerging technologies.

The following transcript has been edited for brevity, but you listen to the podcast above or read the full transcript here.

Ariel: Before we get into anything else, could you first define what risk is?

Andrew: The official definition of risk is it looks at the potential of something to cause harm, but it also looks at the probability. Say you’re looking at exposure to a chemical, risk is all about the hazardous nature of that chemical, its potential to cause some sort of damage to the environment or the human body, but then exposure that translates that potential into some sort of probability. That is typically how we think about risk when we’re looking at regulating things.

I actually think about risk slightly differently, because that concept of risk runs out of steam really fast, especially when you’re dealing with uncertainties, existential risk, and perceptions about risk when people are trying to make hard decisions and they can’t make sense of the information they’re getting. So I tend to think of risk as a threat to something that’s important or of value. That thing of value might be your health, it might be the environment; but it might be your job, it might be your sense of purpose or your sense of identity or your beliefs or your religion or your politics or your worldview.

As soon as we start thinking about risk in that sense, it becomes much broader, much more complex, but it also allows us to explore that intersection between different communities and their different ideas about what’s important and worth protecting.

Jack: I would draw attention to all of those things that are incalculable. When we are dealing with new technologies, they are often things to which we cannot assign probabilities and we don’t know very much about what the likely outcomes are going to be.

I think there is also a question of what isn’t captured when we talk about risk. Not all of the impacts of technology might be considered risk impacts. I’d say that we should also pay attention to all the things that are not to do with technology going wrong, but are also to do with technology going right. Technologies don’t just create new risks, they also benefit some people more than others. And they can create huge inequalities. If they’re governed well, they can also help close inequalities. But if we just focus on risk, then we lose some of those other concerns as well.

Andrew: Jack, so this obviously really interests me because to me an inequality is a threat to something that’s important to someone. Do you have any specific examples of what you think about when you think about inequalities or equality gaps?

Jack: Before we get into examples, the important thing is to bear in mind a trend with technology, which is that technology tends to benefit the powerful. That’s an overall trend before we talk about any specifics, which quite often goes against the rhetoric of technological change, because, often, technologies are sold as being emancipatory and helping the worst off in society – which they do, but typically they also help the better off even more. So there’s that general question.

I think in the specific, we can talk about what sorts of technologies do close inequities and which tend to exacerbate inequities. But it seems to me that just defining that as a social risk isn’t quite getting there.

Ariel: I would consider increasing inequality to be a risk. Can you guys talk about why it’s so hard to get agreement on what we actually define as a risk?

Andrew: People very quickly slip into defining risk in very convenient ways. So if you have a company or an organization that really wants to do something – and that doing something may be all the way from making a bucket load of money to changing the world in the ways they think are good – there’s a tendency for them to define risk in ways that benefit them.

So, for instance, if you are the maker of an incredibly expensive drug, and you work out that that drug is going to be beneficial in certain ways with minimal side effects, but it’s only going to be available to a very few very rich number of people, you will easily define risk in terms of the things that your drug does not do, so you can claim with confidence that this is a risk-free or a low-risk product. But that’s an approach where you work out where the big risks are with your product and you bury them and you focus on the things where you think there is not a risk with your product.

That sort of extends across many, many different areas – this tendency to bury the big risks associated with a new technology and highlight the low risks to make your tech look much better than it is so you can reach the aims that you’re trying to achieve.

Jack: I quite agree, Andrew. I think what tends to happen is that the definition of risk gets socialized as being that stuff that society’s allowed to think about whereas the benefits are sort of privatized. The innovators are there to define who benefits and in what ways.

Andrew: I would agree. Though it also gets quite complex in terms of the social dialogue around that and who actually is part of those conversations and who has a say in those conversations.

To get back to your point, Ariel, I think there are a lot of organizations and individuals that want to do what they think is the right thing. But they also want the ability to decide for themselves what the right thing is rather than listening to other people.

Ariel: How do we address that?

Andrew: It’s a knotty problem, and it has its roots in how we are as people and as a society, how we’ve evolved. I think there are a number of ways forwards towards beginning to sort of pick apart the problem. A lot of those are associated with work that is carried out in the social sciences and humanities around how you make these processes more inclusive, how you bring more people to the table, how you begin listening to different perspectives, different sets of values and incorporating them into decisions rather than marginalizing groups that are inconvenient.

Jack: If you regard these things as legitimately political discussions rather than just technical discussions, then the solution is to democratize them and to try to wrest control over the direction of technology away from just the innovators and to see that as the subject of proper democratic conversation.

Andrew: And there are some very practical things here. This is where Jack and I might actually diverge in our perspectives. But from a purely business sense, if you’re trying to develop a new product or a new technology and get it to market, the last thing you can afford to do is ignore the nature of the population, the society that you’re trying to put that technology into. Because if you do, you’re going to run up against roadblocks where people decide they either don’t like the tech or they don’t like the way that you’ve made decisions around it or they don’t like the way that you’ve implemented it.

So from a business perspective, taking a long-term strategy, it makes far more sense to engage with these different communities and develop a dialogue around them so you understand the nature of the landscape that you’re developing a technology into. You can see ways of partnering with communities to make sure that that technology really does have a broad beneficial impact.

Ariel: Why do you think companies resist doing that?

Andrew: I think we’ve had centuries of training that says you don’t ask awkward questions because they potentially lead to you not being able to do what you want to do. It’s partly the mentality around innovation. But, also, it’s hard work. It takes a lot of effort, and it actually takes quite a lot of humility as well.

Jack: There’s a sort of well-defined law in technological change, which is that we overestimate the effect of technology in the short term and underestimate the effect of technology in the long term. Given that companies and innovators have to make short time horizon decisions, often they don’t have the capacity to take on board these big world-changing implications of technology.

If you look at something like the motorcar, it would have been inconceivable for Henry Ford to have imagined the world in which his technology would exist in 50 years time. Even though we know that the motorcar has led to the reshaping of large parts of America. It’s led to an absolutely catastrophic level of public health risk while also bringing about clear benefits of mobility. But those are big long-term changes that evolve very slowly, far slower than any company could appreciate.

Andrew: So can I play devil’s advocate here, Jack? With hindsight should Henry Ford have developed his production line process differently to avoid some of the impacts we now see of motor vehicles?

Jack: You’re right to say with hindsight it’s really hard to see what he might have done differently, because the point is the changes that I was talking about are systemic ones with responsibility shared across large parts of the system. Now, could we have done better at anticipating some of those things? Yes, I think we could have done, and I think had motorcar manufacturers talked to regulators and civil society at the time, they could have anticipated some of those things because there are also barriers that stop innovators from anticipating. There are actually things that force innovators time horizons to narrow.

Andrew: That’s one of the points that really interests me. It’s not this case of “do we, don’t we” with a certain technology, but could we do things better so we see more longer-term benefits and we see fewer hurdles that maybe we could have avoided if we had been a little smarter from the get-go.

Ariel: But how much do you think we can actually anticipate?

Andrew: Well, the basic answer is very little indeed. The one thing that we know about anticipating the future is that we’re always going to get it wrong. But I think that we can put plausible bounds around likely things that are going to happen. Simply from what we know about how people make decisions and the evidence around that, we know that if you ignore certain pieces of information, certain evidence, you’re going to make worse decisions in terms of projecting or predicting future pathways than if you’re actually open to evaluating different types of evidence.

By evidence, I’m not just meaning the scientific evidence, but I’m also thinking about what people believe or hold as valuable within society and what motivates them to do certain things and react in certain ways. All of that is important evidence in terms of getting a sense of what the boundaries are of a future trajectory.

Jack: Yes, we will always get our predictions wrong, but if anticipation is about preparing us for the future rather than predicting the future, then rightness or wrongness isn’t really the target. Instead, I would draw attention to the history of cases in which there has been willful ignorance of particular perspectives or particular evidence that has only been realized later – which, as you know better than anybody, the evidence of public health risk that has been swept under the carpet. We have to look first at the sort of incentives that prompt innovators to overlook that evidence.

Andrew: I think that’s so important. It’s worthwhile bringing up the Late lessons from early warnings report that came out of Europe a few years ago, which were a series of case studies of previous technological innovations over the last 100 years or so, looking at where innovators and companies and even regulators either missed important early warnings or willfully ignored them, and that led to far greater adverse impacts than there really should have been. I think there are a lot of lessons to be learned from those.

Ariel: I’d like to take that and move into some more specific examples now. Jack, I know you’re interested in self-driving vehicles. I was curious, how do we start applying that to these new technologies that will probably be, literally, on the road soon?

Jack: It’s extremely convenient for innovators to define risks in particular ways that suit their own ambitions. I think you see this in the way that the self-driving cars debate is playing out. In part, that’s because the debate is a largely American one and it emanates from an American car culture.

Here in Europe, we see a very different approach to transport with a very different emerging debate. So the trolley problem, the classic example of a risk issue where engineers very conveniently are able to treat it as an algorithmic challenge. How do we maximize public benefits and reduce public risk? Here in Europe where our transport systems are complicated, multimodal; where our cities are complicated, messy things, the self-driving car risks start to expand pretty substantially in all sorts of dimensions.

So the sorts of concerns that I would see for the future of self-driving cars relate more to what are sometimes called second order consequences. What sorts of worlds are these technologies likely to enable? What sorts of opportunities are they likely to constrain? I think that’s a far more important debate than the debate about how many lives a self-driving car will either save or take in its algorithmic decision-making.

Andrew: Jack, you have referred to the trolley problem as trolleys and follies. One of the things I really grapple with, and I think it’s very similar to what you were saying, is that the trolley problem seems to be a false or a misleading articulation of risk. It’s something which is philosophical and hypothetical, but actually doesn’t seem to bear much relation to the very real challenges and opportunities that we’re grappling with with these technologies.

Now, the really interesting thing here is, I get really excited about the self-driving vehicle technologies, partly living here in Tempe where Google and Uber and various other companies are testing them on the road now. But you have quite a different perspective in terms of how fast we’re going with the technology and how little thought there is into the longer term social consequences. But to put my full cards on the table, I can’t wait for better technologies in this area.

Jack: Well, without wishing to be too congenial, I am also excited about the potential for the technology. But what I know about past technology suggests that it may well end up gloriously suboptimal. I’m interested in a future involving self-driving cars that might actually realize some of the enormous benefits to, for example, bringing accessibility to people who currently can’t drive. The enormous benefits to public safety, to congestion, but making that work will not just involve a repetition of current dynamics of technological change. I think current ownership models in the US, current modes of transport in the US just are not conducive to making that happen. So I would love to see governments taking control of this and actually making it work in the same way as in the past, governments have taken control of transport and built public value transport systems.

Ariel: If governments are taking control of this and they’re having it done right, what does that mean?

Jack: The first thing that I don’t see any of within the self-driving car debate, because I just think we’re at too early a stage, is an articulation of what we want from self-driving cars. We have the Google vision, the Waymo vision of the benefits of self-driving cars, which is largely about public safety. But no consideration of what it would take to get that right. I think that’s going to look very different. I think to an extent Tempe is an easy case, because the roads in Arizona are extremely well organized. It’s sunny, pedestrians behave themselves. But what you’re not going to be able to do is take that technology and transport it to central London and expect it to do the same job.

So some understanding of desirable systems across different places is really important. That, I’m afraid, does mean sharing control between the innovators and the people who have responsibility for public safety, public transport and public space.

Andrew: Even though most people in this field and other similar fields are doing it for what they claim is for future benefits and the public good, there’s a huge gap between good intentions of doing the right thing and actually being able to achieve something positive for society. I think the danger is that good intentions go bad very fast if you don’t have the right processes and structures in place to translate them into something that benefits society. To do that, you’ve got to have partnerships and engagement with agencies and authorities that have oversight over these technologies, but also the communities and the people that are either going to be impacted by them or benefit by them.

Jack: I think that’s right. Just letting the benefits as stated by the innovators speak for themselves hasn’t worked in the past, and it won’t work here. We have to allow some sort of democratic discussion about that.

Ariel: I want to move forward in the future to more advanced technology, looking at more advanced artificial intelligence, even super intelligence. How do we address risks that are associated with that when a large number of researchers don’t even think this technology can be developed, or if it is developed, it’s still hundreds of years away? How do you address these really big unknowns and uncertainties?

Andrew: That’s a huge question. So I’m speaking here as something of a cynic of some of the projections of superintelligence. I think you’ve got to develop a balance between near and mid-term risks, but at the same time, work out how you take early action on trajectories so you’re less likely to see the emergence of those longer-term existential risks. One of the things that actually really concerns me here is if you become too focused on some of the highly speculative existential risks, you end up missing things which could be catastrophic in a smaller sense in the near to mid-term.

Pouring millions upon millions of dollars into solving a hypothetical problem around superintelligence and the threat to humanity sometime in the future, at the expense of looking at nearer-term things such as algorithmic bias, autonomous decision-making that cuts people out of the loop and a whole number of other things, is a risk balance that doesn’t make sense to me. Somehow, you’ve got to deal with these emerging issues, but in a way which is sophisticated enough that you’re not setting yourself up for problems in the future.

Jack: I think getting that balance right is crucial. I agree with your assessment that that balance is far too much, at the moment, in the direction of the speculative and long-term. One of the reasons why it is, is because that’s an extremely interesting set of engineering challenges. So I think the question would be on whose shoulders does the responsibility lie for acting once you recognize threats or risks like that? Typically, what you find when a community of scientists gathers to assess risks is that they frame the issue in ways that lead to scientific or technical solutions. It’s telling, I think, that in the discussion about superintelligence, the answer, either in the foreground or in the background, is normally more AI not less AI. And the answer is normally to be delivered by engineers rather than to be governed by politicians.

That said, I think there’s sort of cause for optimism if you look at the recent campaign around autonomous weapons. That would seem to be a clear recognition of a technologically mediated issue where the necessary action is not on the part of the innovators themselves but on all the people who are in control of our armed forces.

Andrew: I think you’re exactly right, Jack. I should clarify that even though there is a lot of discussion around speculative existential risks, there is also a lot of action on nearer-term issues such as the lethal autonomous weapons. But one of the things that I’ve been particularly struck with in conversations is the fear amongst technologists of losing control over the technology and the narrative. I’ve had conversations where people have said that they’re really worried about the potential down sides, the potential risks of where artificial intelligence is going. But they’re convinced that they can solve those problems without telling anybody else about them, and they’re scared that if they tell a broad public about those risks that they’ll be inhibited in doing the research and the development that they really want to do.

That really comes down to not wanting to relinquish control over technology. But I think that there has to be some relinquishment there if we’re going to have responsible development of these technologies that really focuses on how they could impact people both in the short as well as the long-term, and how as a society we find pathways forwards.

Ariel: Andrew, I’m really glad you brought that up. That’s one that I’m not convinced by, this idea that if we tell the public what the risks are, then suddenly the researchers won’t be able to do the research they want. Do you see that as a real risk for researchers?

Andrew: I think there is a risk there, but it’s rather complex. Most of the time, the public actually don’t care about these things. There are one or two examples; genetically modifying organisms is the one that always comes up. But that is a very unique and very distinct example. Most of the time, if you talk broadly about what’s happening with a new technology, people will say, that’s interesting, and get on with their lives. So there’s much less risk there about talking about it than I think people realize.

The other thing, though, is even if there is a risk of people saying “hold on a minute, we don’t like what’s happening here,” better to have that feedback sooner rather than later, because the reality is people are going to find out what’s happening. If they discover as a company or a research agency or a scientific group that you’ve been doing things that are dangerous and you haven’t been telling them about it, when they find out after the fact, people get mad. That’s where things get really messy.

[What’s also] interesting – you’ve got a whole group of people in the technology sphere who are very clearly trying to do what they think is the right thing. They’re not in it primarily for fame and money, but they’re in it because they believe that something has to change to build a beneficial future.

The challenge is, these technologists, if they don’t realize the messiness of working with people and society and they think just in terms of technological solutions, they’re going to hit roadblocks that they can’t get over. So this to me is why it’s really important that you’ve got to have the conversations. You’ve got to take the risk to talk about where things are going with the broader population. You’ve got to risk your vision having to be pulled back a little bit so it’s more successful in the long-term.

Ariel: I was hoping you could both touch on the impact of media as well and how that’s driving the discussion.

Jack: I think blaming the media is always the convenient thing to do. They’re the convenient target. I think the question is about actually the culture, which is extremely technologically utopian and which wants to believe that there are simple technological solutions to some of our most pressing problems. In that culture, it is understandable if seemingly seductive ideas, whether about artificial intelligence or about new transport systems, are taken. I would love there to be a more skeptical attitude so that when those sorts of claims are made, just as when any sort of political claim is made, that they are scrutinized and become the starting point for a vigorous debate about the world in which we want to live in. I think that is exactly what is missing from our current technological discourse.

Andrew: The media is a product of society. We are titillated by extreme, scary scenarios. The media is a medium through which that actually happens. I work a lot with journalists, and I’ve had very few experiences with being misrepresented or misquoted where it wasn’t my fault in the first place.

So I think we’ve got to think of two things when we think of media coverage. First of all, we’ve got to get smarter in how we actually communicate, and by we I mean the people that feel we’ve got something to say here. We’ve got to work out how to communicate in a way that makes sense with the journalists and the media that we’re communicating through. We’ve also got to realize that even though we might be outraged by a misrepresentation, that usually doesn’t get as much traction in society as we think it does. So we’ve got to be a little bit more laid back about how we see things reported.

Ariel: Is there anything else that you think is important to add?

Andrew: I would just sort of wrap things up. There has been a lot of agreement, but actually, and this is an important thing, it’s because most people, including people that are often portrayed as just being naysayers, are trying to ask difficult questions so we can actually build a better future through technology and through innovation in all its forms. I think it’s really important to realize that just because somebody asks difficult questions doesn’t mean they’re trying to stop progress, but they’re trying to make sure that that progress is better for everybody.

Jack: Hear, hear.

Podcast: AI Ethics, the Trolley Problem, and a Twitter Ghost Story with Joshua Greene and Iyad Rahwan

As technically challenging as it may be to develop safe and beneficial AI, this challenge also raises some thorny questions regarding ethics and morality, which are just as important to address before AI is too advanced. How do we teach machines to be moral when people can’t even agree on what moral behavior is? And how do we help people deal with and benefit from the tremendous disruptive change that we anticipate from AI?

To help consider these questions, Joshua Greene and Iyad Rawhan kindly agreed to join the podcast. Josh is a professor of psychology and member of the Center for Brain Science Faculty at Harvard University, where his lab has used behavioral and neuroscientific methods to study moral judgment, focusing on the interplay between emotion and reason in moral dilemmas. He’s the author of Moral Tribes: Emotion, Reason and the Gap Between Us and Them. Iyad is the AT&T Career Development Professor and an associate professor of Media Arts and Sciences at the MIT Media Lab, where he leads the Scalable Cooperation group. He created the Moral Machine, which is “a platform for gathering human perspective on moral decisions made by machine intelligence.”

In this episode, we discuss the trolley problem with autonomous cars, how automation will affect rural areas more than cities, how we can address potential inequality issues AI may bring about, and a new way to write ghost stories.

This transcript has been heavily edited for brevity. You can read the full conversation here.

Ariel: How do we anticipate that AI and automation will impact society in the next few years?

Iyad: AI has the potential to extract better value from the data we’re collecting from all the gadgets, devices and sensors around us. We could use this data to make better decisions, whether it’s micro-decisions in an autonomous car that takes us from A to B safer and faster, or whether it’s medical decision-making that enables us to diagnose diseases better, or whether it’s even scientific discovery, allowing us to do science more effectively, efficiently and more intelligently.

Joshua: Artificial intelligence also has the capacity to displace human value. To take the example of using artificial intelligence to diagnose disease. On the one hand it’s wonderful if you have a system that has taken in all of the medical knowledge we have in a way that no human could and uses it to make better decisions. But at the same time that also means that lots of doctors might be out of a job or have a lot less to do. This is the double-edged sword of artificial intelligence, the value it creates and the human value that it displaces.

Ariel: Can you explain what the trolley problem is and how does that connect to this question of what do autonomous vehicles do in situations where there is no good option?

Joshua: One of the original versions of the trolley problem goes like this (we’ll call it “the switch case”): A trolley is headed towards five people and if you don’t do anything, they’re going to be killed, but you can hit a switch that will turn the trolley away from the five and onto a side track. However on that side track, there’s one unsuspecting person and if you do that, that person will be killed.

The question is: is it okay to hit the switch to save those five people’s lives but at the cost of saving one life? In this case, most people tend to say yes. Then we can vary it a little bit. In “the footbridge case,” the situation is different as follows: the trolley is now headed towards five people on a single track, over that track is a footbridge and on that footbridge is a large person wearing a very large backpack. You’re also on the bridge and the only way that you can save those five people from being hit by the trolley is to push that big person off of the footbridge and onto the tracks below.

Assume that it will work, do you think it’s okay to push the guy off the footbridge in order to save five lives? Here, most people say no, and so we have this interesting paradox. In both cases, you’re trading one life for five, yet in one case it seems like it’s the right thing to do, in the other case it seems like it’s the wrong thing to do.

One of the classic objections to these dilemmas is that they’re unrealistic. My view is that the point is not that they’re realistic, but instead that they function like high contrast stimuli. If you’re a vision researcher and you’re using flashing black and white checkerboards to study the visual system, you’re not using that because that’s a typical thing that you look at, you’re using it because it’s something that drives the visual system in a way that reveals its structure and dispositions.

In the same way, these high contrast, extreme moral dilemmas can be useful to sharpen our understanding of the more ordinary processes that we bring to moral thinking.

Iyad: The trolley problem can translate in a cartoonish way to a scenario with which an autonomous car is faced with only two options. The car is going at a speed limit on a street and due to mechanical failure is unable to stop and is going to hit it a group of five pedestrians. The car can swerve and hit a bystander. Should the car swerve or should it just plow through the five pedestrians?

This has a structure similar to the trolley problem because you’re making similar tradeoffs between one and five people and the decision is not being taken on the spot, it’s actually happening at the time of the programming of the car.

There is another complication in which the person being sacrificed to save the greater number of people is the person in the car. Suppose the car can swerve to avoid the five pedestrians but as a result falls off a cliff. That adds another complication especially that programmers are going to have to appeal to customers. If customers don’t feel safe in those cars because of some hypothetical situation that may take place in which they’re sacrificed, that pits the financial incentives against the potentially socially desirable outcome, which can create problems.

A question that raises itself is: Is it going to ever happen? How many times do we face these kinds of situations as we drive today? So the argument goes: these situations are going to be so rare that they are irrelevant and that autonomous cars promise to be substantially safer than human-driven cars that we have today, that the benefits significantly outweigh the costs.

There is obviously truth to this argument, if you take the trolley problem scenario literally. But what the autonomous car version of the trolley problem is doing, is it’s abstracting the tradeoffs that are taking place every microsecond, even now.

Imagine you’re driving on the road and there is a large truck on the lane to your left and as a result you choose to stick a little bit further to the right, just to minimize risk in case this car gets off its lane. Now suppose that there could be a cyclist later on the right hand side, what you’re effectively doing in this small maneuver is slightly reducing risk to yourself but slightly increasing risk to the cyclist. These sorts of decisions are being made millions and millions of times every day.

Ariel: Applying the trolley problem to self-driving cars seems to be forcing the vehicle and thus the programmer of the vehicle to make a judgment call about whose life is more valuable. Can we not come up with some other parameters that don’t say that one person’s life is more valuable than someone else’s?

Joshua: I don’t think that there’s any way to avoid doing that. If you’re a driver, there’s no way to avoid answering the question, how cautious or how aggressive am I going to be. You can not explicitly answer the question; you can say I don’t want to think about that, I just want to drive and see what happens. But you are going to be implicitly answering that question through your behavior, and in the same way, autonomous vehicles can’t avoid the question. Either the people who are designing the machines, training the machines or explicitly programming to behave in certain ways, they are going to do things that are going to affect the outcome.

The cars will constantly be making decisions that inevitably involve value judgments of some kind.

Ariel: To what extent have we actually asked customers what it is that they want from the car? In a completely ethical world, I would like the car to protect the person who’s more vulnerable, who would be the cyclist. In practice, I have a bad feeling I’d probably protect myself.

Iyad: We could say we want to treat everyone equally. On the other hand, you have this self-protective instinct which presumably as a consumer, that’s what you want to buy for yourself and your family. On the other hand you also care for vulnerable people. Different reasonable and moral people can disagree on what the more important factors and considerations should be and I think this is precisely why we have to think about this problem explicitly, rather than leave it purely to – whether it’s programmers or car companies or any particular single group of people – to decide.

Joshua: When we think about problems like this, we have a tendency to binarize it, but it’s not a binary choice between protecting that person or not. It’s really going to be matters of degree. Imagine there’s a cyclist in front of you going at cyclist speed and you either have to wait behind this person for another five minutes creeping along much slower than you would ordinarily go, or you have to swerve into the other lane where there’s oncoming traffic at various distances. Very few people might say I will sit behind this cyclist for 10 minutes before I would go into the other lane and risk damage to myself or another car. But very few people would just blow by the cyclist in a way that really puts that person’s life in peril.

It’s a very hard question to answer because the answers don’t come in the form of something that you can write out in a sentence like, “give priority to the cyclist.” You have to say exactly how much priority in contrast to the other factors that will be in play for this decision. And that’s what makes this problem so interesting and also devilishly hard to think about.

Ariel: Why do you think this is something that we have to deal with when we’re programming something in advance and not something that we as a society should be addressing when it’s people driving?

Iyad: We very much value the convenience of getting from A to B. Our lifetime odds of dying from a car accident is more than 1%, yet somehow, we’ve decided to put up with this because of the convenience. As long as people don’t run through a red light or are not drunk, you don’t really blame them for fatal accidents, we just call them accidents.

But now, thanks to autonomous vehicles that can make decisions and reevaluate situations hundreds or thousands of times per second and adjust their plan and so on – we potentially have the luxury to make those decisions a bit better and I think this is why things are different now.

Joshua: With the human we can say, “Look, you’re driving, you’re responsible, and if you make a mistake and hurt somebody, you’re going to be in trouble and you’re going to pay the cost.” You can’t say that to a car, even a car that’s very smart by 2017 standards. The car isn’t going to be incentivized to behave better – the motivation has to be explicitly trained or programmed in.

Iyad: Economists say you can incentivize the people who make the cars to program them appropriately by fining them and engineering the product liability law in such a way that would hold them accountable and responsible for damages, and this may be the way in which we implement this feedback loop. But I think the question remains what should the standards be against which we hold those cars accountable.

Joshua: Let’s say somebody says, “Okay, I make self-driving cars and I want to make them safe because I know I’m accountable.” They still have to program or train the car. So there’s no avoiding that step, whether it’s done through traditional legalistic incentives or other kinds of incentives.

Ariel: I want to ask about some other research you both do. Iyad you look at how AI and automation impact us and whether that could be influenced by whether we live in smaller towns or larger cities. Can you talk about that?

Iyad: Clearly there are areas that may potentially benefit from AI because it improves productivity and it may lead to greater wealth, but it can also lead to labor displacement. It could cause unemployment if people aren’t able to retool and improve their skills so that they can work with these new AI tools and find employment opportunities.

Are we expected to experience this in a greater way or in a smaller magnitude in smaller versus bigger cities? On one hand there are lots of creative jobs in big cities and, because creativity is so hard to automate, it should make big cities more resilient to these shocks. On the other hand if you go back to Adam Smith and the idea of the division of labor, the whole idea is that individuals become really good at one thing. And this is precisely what spurred urbanization in the first industrial revolution. Even though the system is collectively more productive, individuals may be more automatable in terms of their narrowly-defined tasks.

But when we did the analysis, we found that indeed larger cities are more resilient in relative terms. The preliminary findings are that in bigger cities there is more production that requires social interaction and very advanced skills like scientific and engineering skills. People are better able to complement the machines because they have technical knowledge, so they’re able to use new intelligent tools that are becoming available, but they also work in larger teams on more complex products and services.

Ariel: Josh, you’ve done a lot of work with the idea of “us versus them.” And especially as we’re looking in this country and others at the political situation where it’s increasingly polarized along this line of city versus smaller town, do you anticipate some of what Iyad is talking about making the situation worse?

Joshua: I certainly think we should be prepared for the possibility that it will make the situation worse. The central idea is that as technology advances, you can produce more and more value with less and less human input, although the human input that you need is more and more highly skilled.

If you look at something like Turbo Tax, before you had lots and lots of accountants and many of those accountants are being replaced by a smaller number of programmers and super-expert accountants and people on the business side of these enterprises. If that continues, then yes, you have more and more wealth being concentrated in the hands of the people whose high skill levels complement the technology and there is less and less for people with lower skill levels to do. Not everybody agrees with that argument, but I think it’s one that we ignore at our peril.

Ariel: Do you anticipate that AI itself would become a “them,” or do you think it would be people working with AI versus people who don’t have access to AI?

Joshua: The idea of the AI itself becoming the “them,” I am agnostic as to whether or not that could happen eventually, but this would involve advances in artificial intelligence beyond anything we understand right now. Whereas the problem that we were talking about earlier – humans being divided into a technological, educated, and highly-paid elite as one group and then the larger group of people who are not doing as well financially – that “us-them” divide, you don’t need to look into the future, you can see it right now.

Iyad: I don’t think that the robot will be the “them” on their own, but I think the machines and the people who are very good at using the machines to their advantage, whether it’s economic or otherwise, will collectively be a “them.” It’s the people who are extremely tech savvy, who are using those machines to be more productive or to win wars and things like that. There would be some sort of evolutionary race between human-machine collectives.

Joshua: I think it’s possible that people who are technologically enhanced could have a competitive advantage and set off an economic arms race or perhaps even literal arms race of a kind that we haven’t seen. I hesitate to say, “Oh, that’s definitely going to happen.” I’m just saying it’s a possibility that makes a certain kind of sense.

Ariel: Do either of you have ideas on how we can continue to advance AI and address these divisive issues?

Iyad: There are two new tools at our disposal: experimentation and machine-augmented regulation.

Today, [there are] cars with a bull bar in front of them. These metallic bars at the front of the car increase safety for the passenger in the case of collision, but they have disproportionate impact on other cars, on pedestrians and cyclists, and they’re much more likely to kill them in the case of an accident. As a result, by making this comparison, by identifying that cars with bull bars are worse for certain group, the trade off was not acceptable, and many countries have banned them, for example the UK, Australia, and many European countries.

If there was a similar trade off being caused by a software feature, then, we wouldn’t know unless we allowed for experimentation as well as monitoring – if we looked at the data to identify whether a particular algorithm is making for very safe cars for customers, but at the expense of a particular group.

In some cases, these systems are going to be so sophisticated and the data is going to be so abundant that we won’t be able to observe them and regulate them in time. Think of algorithmic trading programs. No human being is able to observe these things fast enough to intervene, but you could potentially insert another algorithm, a regulatory algorithm or an oversight algorithm, that will observe other AI systems in real time on our behalf, to make sure that they behave.

Joshua: There are two general categories of strategies for making things go well. There are technical solutions to things and then there’s the broader social problem of having a system of governance that can be counted on to produce outcomes that are good for the public in general.

The thing that I’m most worried about is that if we don’t get our politics in order, especially in the United States, we’re not going to have a system in place that’s going to be able to put the public’s interest first. Ultimately, it’s going to come down to the quality of the government that we have in place, and quality means having a government that distributes benefits to people in what we would consider a fair way and takes care to make sure that things don’t go terribly wrong in unexpected ways and generally represents the interests of the people.

I think we should be working on both of these in parallel. We should be developing technical solutions to more localized problems where you need an AI solution to solve a problem created by AI. But I also think we have to get back to basics when it comes to the fundamental principles of our democracy and preserving them.

Ariel: As we move towards smarter and more ubiquitous AI, what worries you most and what are you most excited about?

Joshua: I’m pretty confident that a lot of labor is going to be displaced by artificial intelligence. I think it is going to be enormously politically and socially disruptive, and I think we need to plan now. With self-driving cars especially in the trucking industry, I think that’s going to be the first and most obvious place where millions of people are going to be out of work and it’s not going to be clear what’s going to replace it for them.

I’m excited about the possibility of AI producing value for people in a way that has not been possible before on a large scale. Imagine if anywhere in the world that’s connected to the Internet, you could get the best possible medical diagnosis for whatever is ailing you. That would be an incredible life-saving thing. And as AI teaching and learning systems get more sophisticated, I think it’s possible that people could actually get very high quality educations with minimal human involvement and that means that people all over the world could unlock their potential. And I think that that would be a wonderful transformative thing.

Iyad: I’m worried about the way in which AI and specifically autonomous weapons are going to alter the calculus of war. In order to aggress on another nation, you have to mobilize humans, you have to get political support from the electorate, you have to handle the very difficult process of bringing back people in coffins, and the impact that this has on electorates.

This creates a big check on power and it makes people think very hard about making these kinds of decisions. With AI, when you’re able to wage wars with very little loss to life, especially if you’re a very advanced nation that is at the forefront of this technology, then you have disproportionate power. It’s kind of like a nuclear weapon, but maybe more because it’s much more customizable. It’s not an all out or nothing – you could start all sorts of wars everywhere.

I think it’s going to be a very interesting shift in the way superpowers think about wars and I worry that this might make them trigger happy. I think a new social contract needs to be written so that this power is kept in check and that there’s more thought that goes into this.

On the other hand, I’m very excited about the abundance that will be created by AI technologies. We’re going to optimize the use of our resources in many ways. In health and in transportation, in energy consumption and so on, there are so many examples in recent years in which AI systems are able to discover ways in which even the smartest humans haven’t been able to optimize.

Ariel: One final thought: This podcast is going live on Halloween, so I want to end on a spooky note. And quite conveniently, Iyad’s group has created Shelley, which is a Twitter chatbot that will help you craft scary ghost stories. Shelley is, of course, a nod to Mary Shelley who wrote Frankenstein, which is the most famous horror story about technology. Iyad, I was hoping you could tell us a bit about how Shelley works.

Iyad: Yes, well this is our second attempt at doing something spooky for Halloween. Last year we launched the nightmare machine, which was using deep neural networks and style transfer algorithms to take ordinary photos and convert them into haunted houses and zombie-infested places. And that was quite interesting; it was a lot of fun. More recently, now we’ve launched Shelley, which people can visit on shelley.ai, and it is named after Mary Shelley who authored Frankenstein.

This is a neural network that generates text and it’s been trained on a very large data set of over 100 thousand short horror stories from a subreddit called No Sleep. And so it’s basically got a lot of human knowledge about what makes things spooky and scary, and the nice thing is that it generates part of the story and people can tweet back at it a continuation of the story and then basically take turns with the AI to craft stories. And we feature those stories on the website afterwards. if I’m correct, this is the first collaborative human-AI horror writing exercise ever.