State of California Endorses Asilomar AI Principles

Click here to see this page in other languages:  Russian 

On August 30, the State of California unanimously adopted legislation in support of the Future of Life Institute’s Asilomar AI Principles.

The Asilomar AI Principles are a set of 23 principles intended to promote the safe and beneficial development of artificial intelligence. The principles – which include research issues, ethics and values, and longer-term issues – emerged from a collaboration between AI researchers, economists, legal scholars, ethicists, and philosophers in Asilomar, California in January of 2017.

The Principles are the most widely adopted effort of their kind. They have been endorsed by AI research leaders at Google DeepMind, GoogleBrain, Facebook, Apple, and OpenAI. Signatories include Demis Hassabis, Yoshua Bengio, Elon Musk, Ray Kurzweil, the late Stephen Hawking, Tasha McCauley, Joseph Gordon-Levitt, Jeff Dean, Tom Gruber, Anthony Romero, Stuart Russell, and more than 3,800 other AI researchers and experts.

With ACR 215 passing the State Senate with unanimous support, the California Legislature has now been added to that list.

Assemblyman Kevin Kiley, who led the effort, said, “By endorsing the Asilomar Principles, the State Legislature joins in the recognition of shared values that can be applied to AI research, development, and long-term planning — helping to reinforce California’s competitive edge in the field of artificial intelligence, while assuring that its benefits are manifold and widespread.”

The third Asilomar AI principle indicates the importance of constructive and healthy exchange between AI researchers and policymakers, and the passing of this resolution highlights the value of that endeavor. While the principles do not establish enforceable policies or regulations, the action taken by the California Legislature is an important and historic show of support across sectors towards a common goal of enabling safe and beneficial AI.

The Future of Life Institute (FLI), the nonprofit organization that led the creation of the Asilomar AI Principles, is thrilled by this latest development, and encouraged that the principles continue to serve as guiding values for the development of AI and related public policy.

“By endorsing the Asilomar AI Principles, California has taken a historic step towards the advancement of beneficial AI and highlighted its leadership of this transformative technology,” said Anthony Aguirre, cofounder of FLI and physics professor at the University of California, Santa Cruz. “We are grateful to Assemblyman Kevin Kiley for leading the charge and to the dozens of co-authors of this resolution for their foresight on this critical matter.”

Profound societal impacts of AI are no longer merely a question of science fiction, but are already being realized today – from facial recognition technology, to drone surveillance, and the spread of targeted disinformation campaigns. Advances in AI are helping to connect people around the world, improve productivity and efficiencies, and uncover novel insights. However, AI may also pose safety and security threats, exacerbate inequality, and constrain privacy and autonomy.

“New norms are needed for AI that counteract dangerous race dynamics and instead center on trust, security, and the common good,” says Jessica Cussins, AI Policy Lead for FLI. “Having the official support of California helps establish a framework of shared values between policymakers, AI researchers, and other stakeholders. FLI encourages other governmental bodies to support the 23 principles and help shape an exciting and equitable future.”

Podcast: Artificial Intelligence – Global Governance, National Policy, and Public Trust with Allan Dafoe and Jessica Cussins

Experts predict that artificial intelligence could become the most transformative innovation in history, eclipsing both the development of agriculture and the industrial revolution. And the technology is developing far faster than the average bureaucracy can keep up with. How can local, national, and international governments prepare for such dramatic changes and help steer AI research and use in a more beneficial direction?

On this month’s podcast, Ariel spoke with Allan Dafoe and Jessica Cussins about how different countries are addressing the risks and benefits of AI, and why AI is such a unique and challenging technology to effectively govern. Allan is the Director of the Governance of AI Program at the Future of Humanity Institute, and his research focuses on the international politics of transformative artificial intelligence. Jessica is an AI Policy Specialist with the Future of Life Institute, and she’s also a Research Fellow with the UC Berkeley Center for Long-term Cybersecurity, where she conducts research on the security and strategy implications of AI and digital governance.

Topics discussed in this episode include:

  • Three lenses through which to view AI’s transformative power
  • Emerging international and national AI governance strategies
  • The risks and benefits of regulating artificial intelligence
  • The importance of public trust in AI systems
  • The dangers of an AI race
  • How AI will change the nature of wealth and power

Papers and books discussed in this episode include:

You can listen to the podcast above and read the full transcript below. You can check out previous podcasts on SoundCloud, iTunes, GooglePlay, and Stitcher.


Ariel: Hi there, I’m Ariel Conn with the Future of Life Institute. As we record and publish this podcast, diplomats from around the world are meeting in Geneva to consider whether to negotiate a ban on lethal autonomous weapons. As a technology that’s designed to kill people, it’s no surprise that countries would consider regulating or banning these weapons, but what about all other aspects of AI? While, most, if not all AI researchers, are designing the technology to improve health, ease strenuous or tedious labor, and generally improve our well-being, most researchers also acknowledge that AI will be transformative, and if we don’t plan ahead, those transformations could be more harmful than helpful.

We’re already seeing instances in which bias and discrimination have been enhanced by AI programs. Social media algorithms are being blamed for impacting elections; it’s unclear how society will deal with the mass unemployment that many fear will be a result of AI developments, and that’s just the tip of the iceberg. These are the problems that we already anticipate and will likely arrive with the relatively narrow AI we have today. But what happens as AI becomes even more advanced? How can people, municipalities, states, and countries prepare for the changes ahead?

Joining us to discuss these questions are Allan Dafoe and Jessica Cussins. Allan is the Director of the Governance of AI program at the Future of Humanity Institute, and his research focuses on the international politics of transformative artificial intelligence. His research seeks to understand the causes of world peace, particularly in the age of advanced artificial intelligence.

Jessica is an AI Policy Specialist with the Future of Life Institute, where she explores AI policy considerations for near and far term. She’s also a Research Fellow with the UC Berkeley Center for Long-term Cybersecurity, where she conducts research on the security and strategy implications of AI and digital governance. Jessica and Allan, thank you so much for joining us today.

Allan: Pleasure.

Jessica: Thank you, Ariel.

Ariel: I want to start with a quote, Allan, that’s on your website and also on a paper that you’re working on that we’ll get to later, where it says, “AI will transform the nature of wealth and power.” And I think that’s sort of at the core of a lot of the issues that we’re concerned about in terms of what the future will look like and how we need to think about what impact AI will have on us and how we deal with that. And more specifically, how governments need to deal with it, how corporations need to deal with it. So, I was hoping you could talk a little bit about the quote first and just sort of how it’s influencing your own research.

Allan: I would be happy to. So, we can think of this as a proposition that may or may not be true, and I think we could easily spend the entire time talking about the reasons why we might think it’s true and the character of it. One way to motivate it, as I think has been the case for people, is to consider that it’s plausible that artificial intelligence would at some point be human-level in a general sense, and to recognize that that would have profound implications. So, you can start there, as, for example, if you were to read Superintelligence by Nick Bostrom, you sort of start at some point in the future and reflect on how profound this technology would be. But I think you can also motivate this with much more near-term perspective and thinking of AI more in a narrow sense.

So, I will offer three lenses for thinking about AI and then I’m happy to discuss it more. The first lens is that of general purpose technology. Economists and others have looked at AI and seen that it seems to fit the category of general purpose technology, which are classes of technologies that provide a crucial input to many important processes, economic, political, and military, social, and are likely to generate these complementary innovations in other areas. And general purpose technologies are also often used as a concept to explain economic growth, so you have things like the railroad or steam power or electricity or the motor vehicle or the airplane or the computer, which seem to change these processes that are important, again, for the economy or for society or for politics in really profound ways. And I think it’s very plausible that artificial intelligence not only is a general purpose technology, but is perhaps the quintessential general purpose technology.

And so in a way that sounds like a mundane statement. General purpose, it will sort of infuse throughout the economy and political systems, but it’s also quite profound because when you think about it, it’s like saying it’s this core innovation that generates a technological revolution. So, we could say a lot about that, and maybe I should just to sort of give a bit more color, I think Kevin Kelly has a nice quote where he says, “Everything that we formally electrified, we will now cognitize. There’s almost nothing we can think of that cannot be made new, different, or interesting by infusing it with some extra IQ.” We could say a lot more about general purpose technologies and why they’re so transformative to wealth and power, but I’ll move on to the other two lenses.

The second lens is to think about AI as an information and communication technology. You might think this is a subset of general purpose technologies. So, other technologies in that reference class would include the printing press, the internet, and the telegraph. And these are important because they change, again, sort of all of society and the economy. They make possible new forms of military, new forms of political order, new forms of business enterprise, and so forth. So we could say more about that, and those have important properties related to inequality and some other characteristics that we care about.

But I’ll just move on to the third lens, which is that of intelligence. So, unlike every other general purpose technology, which applied to energy, production, or communication or transportation, AI is a new kind of general purpose technology. It changes the nature of our cognitive processes, it enhances them, it makes them more autonomous, generates new cognitive capabilities. And I think it’s that lens that makes it seem especially transformative. In part because the key role that humans play in the economy is increasingly as cognitive agents, so we are now building powerful complements to us, but also substitutes to us, and so that gives rise to the concerns about labor displacement and so forth. But also innovations in intelligence are hard things to forecast how they will work and what those implications will be for everything, and so that makes it especially hard to sort of see what’s through the mist of the future and what it will bring.

I think there’s a lot of interesting insights that come from those three lenses, but that gives you a sense of why AI could be so transformative.

Ariel: That’s a really nice introduction to what we want to talk about, which is, I guess, okay so then what? If we have this transformative technology that’s already in progress, how does society prepare for that? I’ve brought you both on because you deal with looking at the prospect of AI governance and AI policy, and so first, let’s just look at some definitions, and that is, what is the difference between AI governance and AI policy?

Jessica: So, I think that there are no firm boundaries between these terms. There’s certainly a lot of overlap. AI policy tends to be a little bit more operational, a little bit more finite. We can think of direct government intervention more for the sake of public service. I think governance tends to be a slightly broader term, can relate to industry norms and principles, for example, as well as government-led initiatives or regulations. So, it could be really useful as a kind of multi-stakeholder lens in bringing different groups to the table, but I don’t think there’s firm boundaries between these. I think there’s a lot of interesting work happening under the framework of both, and depending on what the audience is and the goals of the conversation, it’s useful to think about both issues together.

Allan: Yeah, and to that I might just add that governance has a slightly broader meaning, so whereas policy often sort of connotes policies that companies or governments develop intentionally and deploy, governance refers to those, but also sort of unintended policies or institutions or norms and just latent processes that shape how the phenomenon develops. So how AI develops and how it’s deployed, so everything from public opinion to the norms we set up around artificial intelligence and sort of emergent policies or regulatory environments. All of that you can group within governance.

Ariel: One more term that I want to throw in here is the word regulation, because a lot of times, as soon as you start talking about governance or policy, people start to worry that we’re going to be regulating the technology. So, can you talk a little bit about how that’s not necessarily the case? Or maybe it is the case.

Jessica: Yeah, I think what we’re seeing now is a lot of work around norm creation and principles of what ethical and safe development of AI might look like, and that’s a really important step. I don’t think we should be scared of regulation. We’re starting to see examples of policies come into place. A big important example is the GDPR that we saw in Europe that regulates how data can be accessed and used and controlled. We’re seeing increasing examples of these kinds of regulations.

Allan: Another perspective on these terms is that in a way, regulation is a subset, a very small subset, of what governance consists of. So regulation might be especially deliberate attempts by government to shape market behavior or other kinds of behavior, and clearly regulation is sometimes not only needed, but essential for safety and to avoid market failure and to generate growth and other sorts of benefits. But regulation can be very problematic, as you sort of alluded to, for a number of reasons. In general, with technology — and technology’s a really messy phenomenon — it’s often hard to forecast what the next generation of technology will look like, and it’s even harder to forecast what the implications will be for different industries, for society, for political structures.

And so because of that, designing regulation can often fail. It can be misapplied to sort of an older understanding of the technology. Often, the formation of regulation may not be done with a really state-of-the-art understanding of what the technology consists of, and then because technology, and AI in particular, is often moving so quickly, there’s a risk that regulation is sort of out of date by the time it comes into play. So, there are real risks of regulation, and I think a lot of policymakers are aware of that, but also markets do fail and there are really profound impacts of new technologies not only on consumer safety, but in fairness and other ethical concerns, but also more profound impacts, as I’m sure we’ll get to, like the possibility that AI will increase inequality within countries, between people, between countries, between companies. It could generate oligopolistic or monopolistic market structures. So there are these really big challenges emerging from how AI is changing the market and how society should respond, and regulation is an important tool there, but it needs to be done carefully.

Ariel: So, you’ve just brought up quite a few things that I actually do want to ask about. I think the first one that I want to go to is this idea that AI technology is developing a lot faster than the pace of government, basically. How do we deal with that? How do you deal with the fact that something that is so transformative is moving faster than a bureaucracy can handle it?

Allan: This is a very hard question. We can introduce a concept from economics, which is useful, and that is of an externality. So, an externality is some process that when two market actors transact, I buy a product from a seller, it impacts on a third party, so maybe we produce pollution or I produce noise or I deplete some resource or something like that. And policy often should focus on externalities. Those are the sources of market failure. Negative externalities are the ones like pollution that you want to tax or restrict or address, and then positive externalities like innovation are ones you want to promote, you want to subsidize and encourage. And so one way to think about how policy should respond to AI is to look at the character of the externalities.

If the externalities are local and if the sort of relevant stakeholder community is local, then I think a good general policy is to allow a local authority to develop to the lowest level that you can, so you want municipalities or even smaller groups to implement different regulatory environments. The purpose for that is not only so that the regulatory environment is adapted to the local preferences, but also you generate experimentation. So maybe one community uses AI in one way and another employs it in another way, and then over time, we’ll start seeing which approaches work better than others. So, as long as the externalities are local, then that’s, I think, what we should do.

However, many of these externalities are at least national, but most of them actually seem to be international. Then it becomes much more difficult. So, if the externalities are at the country level, then you need country level policy to optimally address them, and then if they’re transnational, international, then you need to negotiate with your neighbors to converge on a policy, and that’s when you get into much greater difficulty because you have to agree across countries and jurisdictions, but also the stakes are so much greater if you get the policy wrong, and you can’t learn from the sort of trial and error of the process of local regulatory experimentation.

Jessica: I just want to push back a little bit on this idea. I mean, if we take regulation out of it for a second and think about the speed at which AI research is happening and kind of policy development, the people that are conducting AI research, it’s a human endeavor, so there are people making decisions, there are institutions that are involved that rely upon existing power structures, and so this is already kind of embedded in policy, and there are political and ethical decisions just in the way that we’re choosing to design and build this technology from the get-go. So all of that’s to say that thinking about policy and ethics as part of that design process I think is really useful and just to not have them as always opposing factors.

One of the things that can really help in this is just improving those communication channels between technologists and policymakers so there isn’t such a wide gulf between these worlds and these conversations that are happening and also bringing in social scientists and others to join in on those conversations.

Allan: I agree.

Ariel: I want to take some of these ideas and look at where we are now. Jessica, you put together a policy resource that covers a lot of efforts being made internationally looking at different countries, within countries, and then also international efforts, where countries are working together to try to figure out how to address some of these AI issues that will especially be cropping up in the very near term. I was wondering if you could talk a little bit about what the current state of AI policy is today.

Jessica: Sure. So this is available publicly. This is It’s also available on the Future of Life homepage. And the idea here is that this is a living resource document, so this is being updated regularly and it’s mapping AI policy developments as they’re happening around the world, so it’s more of an empirical exercise in that way, kind of seeing how different groups and institutions, as well as nations, are framing and addressing these challenges. So, in most cases, we don’t have concrete policies on the ground yet, but we do have strategies, we have frameworks for addressing these challenges, and so we’re mapping what’s happening in that space and hoping that it encourages transparency and also collaboration between actors, which we think is important.

There are three complementary resources that are part of this resource. The first one is a map of national and international strategies, and that includes 27 countries and 6 international initiatives. The second resource is a compilation of AI policy challenges, and this is broken down into 14 different issues, so this ranges from economic impacts and technological unemployment to issues like surveillance and privacy or political manipulation and computational propaganda, and if you click on each of these different challenges, it actually links you with relevant policy principles and recommendations. So, the idea is if you’re a policymaker or you’re interested in this, you actually have some guidance. What are people in the field thinking about ways to address these challenges?

And then the third resource there is a set of reading lists. There are dozens of papers, reports, and articles that are relevant to AI policy debates. We have seven different categories here that include things like AI policy overviews or papers that delve into the security and existential risks of AI. So, this is a good starting place if you’re thinking about how to get involved in AI policy discussions.

Ariel: Can you talk a little bit about some of maybe the more interesting programs that you’ve seen developing so far?

Jessica: So, I mean the U.S. is really interesting right now. There’s been some recent developments. The 2019 National Defense Authorization Act was just signed last week by President Trump, and so this actually made official a new national security commission on artificial intelligence. So we’re seeing the kind of beginnings of a national strategy for AI within the U.S. through these kinds of developments that don’t really resemble what’s happening in other countries. This is part of the defense department, much more tailored to national defense and national security, so there’s going to be 15 commission members looking at a range of different issues, but particularly with how they relate to national defense.

We also have a new joint AI center in the DoD that will be looking at an ethical framework but for defense technologies using AI, so if you compare this kind of focus to what we’ve seen in France, for example, they have a national strategy for AI. It’s called AI for Humanity, and there’s a lengthy report that goes into numerous different kinds of issues; they’re talking about ecology and sustainability, about transparency, much more of a focus on having state-led developments kind of pushing back against the idea that we can just leave this to the private sector to figure out, which is really where the U.S. is going in terms of the consumer uses of AI. Trump’s priorities are to remove regulatory barriers as it relates to AI technology, so France is markedly different and they want to push back against the company control of data and the uses of these technologies. So, that’s kind of an interesting difference we’re seeing.

Allan: I would like to add that I think Jessica’s overview of global AI policy looks like a really useful resource. There’s a lot of links to most of the key, I think, readings that I would think you’d want to direct someone to, so I really recommend people check that out. And then specifically, I just want to respond to this remark Jessica made about sort of U.S. approach letting companies more have a free reign at developing AI versus the French approach, especially well articulated by Macron in his Wired interview is the insight that you’re unlikely to be able to develop AI successfully if you don’t have the trust of important stakeholders, and that mostly means the citizens of your country.

And I think Facebook has realized that and is working really hard to regain the trust from citizens and users, and just in general I think, yeah, if AI products are being deployed in an ecosystem where people don’t trust them, that’s going to handicap the deployment of those AI services. There’ll be sort of barriers to their use, there will be opposition regulation that will not necessarily be the most efficient way of generating AI that’s fair or safe or respects privacy. So, I think this conversation between different governmental authorities and the public and NGOs and researchers and companies around what is good AI, what are the norms that we should expect from AI, and then how do we communicate that and enter into a conversation that, between the public and the developers of AI, is really important and is sort of against U.S. national interests to not have that conversation and not develop that trust.

Ariel: I’d actually like to stick with this subject for a minute because trust is something that I find rather fascinating, actually. How big a risk is it, do you think, that the public could decide, “We just don’t trust this technology and we want it to stop,” and if they did decide that, do you think it would actually stop? Or do you think there’s enough government and financial incentive to continue promoting AI that the public trust may not be as big a deal as it has been for some other technologies?

Jessica: I certainly don’t think that there’s gonna be a complete stop from the companies that are developing this technology, but certainly responses from the public and from their employees can shift behavior. At Google, we’re seeing at Amazon that protests from the employees can lead to changes. So in the case of Google, the employees were upset about the involvement with the U.S. military on Project Maven and didn’t want their technology to be used in that kind of weaponized way, and that led Google to publish their own AI ethics principles, which included specifically that they would not renew that contract and that they would not pursue autonomous weapons. There is certainly a back and forth that happens between the public, between employees of companies and where the technology is going. I think we should feel empowered to be part of that conversation.

Allan: Yeah, I would just second that. Investments in AI and in research and development will not stop, certainly globally, but there’s still a lot of interest that could be substantially harmed, including the public interest from the development of valuable AI services and growth from a breakdown in trust. AI services really depend on trust. You see this with the big AI companies that rely on having a large user base and generating a lot of data. So the algorithms often depend on lots of user interaction and having a large user base to do well, and that only works if users are willing to share their data, if they trust that their data is protected and being used appropriately, if there are not political movements to inefficiently, or not in the interest of the public, prevent the accumulation and use of data.

So, that’s one of the big areas, but I think there are a lot of other ways in which a breakdown in trust would harm the development of AI. It will make it harder for start ups to get going. Also, as Jessica mentioned, I think AI researchers are, they’re not just in it for the money. A lot of them have real political convictions, and if they don’t feel like their work is doing good or if they have ethical concerns with how their work is being used, they are likely to switch companies or express their concerns internally as we saw at Google. I think this is really crucial for a country from the national interest perspective. If you want to have a healthy AI ecosystem, you need to develop a regulatory environment that works but also have relationships with key companies and the public that are informed and sort of stays within the bounds of the public interest in terms of all of the range of ethical and other concerns they would have.

Jessica: Two quick additional points on this issue of trust. The first is that policymakers should not assume that the public will necessarily trust their reaction and their approach to dealing with this, and there’s differences in the public policy processes that happen that can enable greater trust. So, for example, I think there’s a lot to learn from the way that France went about developing their strategy. It took place over the course of a year with hundreds of interviews, extremely consultative with members of the public, and that really encourages buy-in from a range of stakeholders, which I think is important. If we’re gonna be establishing policies that stick around, to have that buy-in not only from industry but also from the publics that are implicated and impacted by these technologies.

A second point is just the importance of norms that we’re seeing in creating cultures of trust, and I don’t want to overstate this, but it’s sort of a first step, and I think we also need monitoring services, we need accountability, we need ways to actually check that these norms aren’t just kind of disappearing into the ether but are upheld in some way. But that being said, they are an important first step, and so I think things like the Asilomar AI principles which were again, a very consultative process that were developed by a large number of people and iterated upon, and only those that had quite a lot of consensus made it into the final principles. We’ve seen thousands of people sign onto those. We’ve seen them being referenced around the world, so those kinds of initiatives are important in kind of helping to establish frameworks of trust.

Ariel: While we’re on this topic, you’ve both been sort of getting into roles of different stakeholders in developing policy and governance, and I’d like to touch on that more explicitly. We have, obviously governments, we have corporations, academia, NGOs, individuals. What are the different roles that these different stakeholders play and do you have tips for how these different stakeholders can try to help implement better and more useful policy?

Allan: Maybe I’ll start and then turn it over to Jessica for the comprehensive answer. I think there’s lots of things that can be said here, and really most actors should be involved in multiple ways. The one I want to highlight is I think the leading AI companies are in a good position to be leaders in shaping norms and best practice and technical understanding and recommendations for policies and regulation. We’re actually quite fortunate that many of them are doing an excellent job with this, so I’ll just call out one that I think is commendable in the extent to which it’s being a good corporate citizen and that’s Alphabet. I think they’ve developed their self-driving car technology in the right way, which is to say, carefully. Their policies towards patents is, I think, more in the public interest and that is that they oppose offensive patent litigation and have really sort of invested in opposing that. You can also tell a business case story for why they would do that. I think they’ve supported really valuable AI research that otherwise groups like FLI or other sort of public interest funding sources would want to support. To example, I’ll offer Chris Olah, in Google Brain, who has done work on transparency and legibility of neural networks. This is highly technical but also extremely important for safety in the near and long-term. This is the kind of thing that we’ll need to figure out to have confidence that really advanced AI is safe and working in our interest, but also in the near-term for understanding things like, “Is this algorithm fair or what was it doing and can we audit it?”

And then one other researcher I would flag, also at Google Brain, is Moritz Hardt has done some excellent work on fairness. And so here you have Alphabet supporting AI researchers who are doing, really I think, frontier work on the ethics of AI and developing technical solutions. And then of course, Alphabet’s been very good with user data and in particular, DeepMind, I think, has been a real leader in safety, ethics, and AI for good. So I think the reason I’m saying this is because I think we should develop a norm, a strong norm that says, “Companies who are the leading beneficiaries of AI services in terms of profit have a social responsibility to exemplify best practice,” and we should call out the ones who are doing a good job and also the ones that are doing bad jobs and encourage the ones that are not doing good jobs to do better, first through norms and then later through other instruments.

Jessica: I absolutely agree with that. I think that we are seeing a lot of leadership from companies and small groups, as well, not even just the major players. Just a couple days ago, an AI marketing company released an AI ethics policy and just said, “Actually, we think every AI company should do this, and we’re gonna start and say that we won’t use negative emotions to exploit people, for example, and that we’re gonna take action to avoid prejudice and bias.” I think these are really important ways to establish as best practices exactly as you said.

The only other thing I would say is that more than other technologies in the past, AI is really being led by a small handful of companies at the moment in terms of the major advances. So I think that we will need some external checks on some of the processes that are happening. If we kind of analyze the topics that come up, for example, in the AI ethics principles coming from companies, not every issue is being talked about. I think there certainly is an important role for governments and academia and NGOs to get involved and point out those gaps and help kind of hold them accountable.

Ariel: I want to transition now a little bit to talk about Allan, some of the work that you are doing at the Governance of AI program. You also have a paper that I believe will be live when this podcast goes live. I’d like you to talk a little bit about what you’re doing there and also maybe look at this transition of how we go from governance of this narrow AI that we have today to looking at how we deal with more advanced AI in the future.

Allan: Great. So the Governance of AI Program is a unit within the Future of Humanity Institute at the University of Oxford. The Future of Humanity Institute was founded by Nick Bostrom, and he’s the Director, and he’s also the author of Superintelligence. So you can see a little bit from that why we’re situated there. The Future of Humanity Institute is actually full of really excellent scholars thinking about big issues, as the title would suggest. And many of them converged on AI as an important thing to think through, an important phenomenon to think through, for the highest stakes considerations. Almost no matter what is important to you, over the time scale of say, four decades and certainly further into the future, AI seems like it will be really important for realizing or failing to realize those things that are important to you.

So, we are primarily focused on the highest stakes governance challenges arising from AI, and that’s often what we’re indicating when we talk about transformative AI. Is that we’re really trying to focus on the kinds of AI, the developments in AI, and maybe this is several decades in the future, that will radically transform wealth and power and safety and world order and other values. However, I think you can motivate a lot of this work by looking at near-term AI, so we could talk about a lot of developments in near-term AI and how they suggest the possibilities for really transformative impacts. I’ll talk through a few of those or just mention a few.

One that we’ve touched on a little bit is labor displacement and inequality. This is not science fiction to talk about the impact of automation and AI on inequality. Economists are now treating this as a very serious hypothesis, and I would say the bulk of belief within the economics community is that AI will at least pose displacement challenges to labor, if not more serious challenges in terms of persistent unemployment.

Secondarily is the issue of inequality that there’s a number of features of AI that seem like they could increase inequality. The main one that I’ll talk about is that digital services in general, but AI in particular, have what seems like a natural global monopoly structure. And this is because the provision of an AI service, like a digital service, often has a very low marginal cost. So it’s effectively free for Netflix to give me a movie. In a market like that for Netflix or for Google Search or for Amazon e-commerce, the competition is all in the fixed cost of developing the really good AI “engine” and then whoever develops the best one can then outcompete and sort of capture the whole market. And then the size of the market really depends on if there’s sort of cultural or consumer heterogeneity.

All of this to say, we see these AI giants, the three in China and the handful in the U.S. Europe, for example, is really concerned that they don’t have an AI giant, and they’re wondering how do they produce an AI champion. And it’s plausible that a combination of factors means it’s actually going to be very hard for Europe to generate the next AI champion. So this has important geopolitical implications, economic implications, implications for welfare of citizens in these countries, implications for tax.

Everything I’m saying right now is really, I think, motivated by near-term and quite credible possibilities. We can then look to other possibilities, which seem more like science fiction but are happening today. For example, the possibilities around surveillance and control from AI and from autonomous weapons, I think, are profound. So, if you have a country or any authority, that could be a company as well, that is able to deploy surveillance systems that can be surveilling your online behavior, for example your behavior on Facebook or your behavior at the workplace. When I leave my chair, if there’s a camera in my office, it can watch if I’m working and what I’m doing, and then of course my behavior in public spaces and elsewhere, then the authority can really get a lot of information on the person who’s being surveilled. And that could have profound implications for the power relations between governments and publics or companies and publics.

And this is the fundamental problem of politics, is how do you build this leviathan, this powerful organization that doesn’t abuse its power. And we’ve done pretty well in many countries developing institutions to discipline the leviathan so that it doesn’t abuse its power, but AI is now providing this dramatically more powerful surveillance tool and then sort of coercion tool, and so that could, say, at the least, enable leaders of totalitarian regimes to really reinforce their control over their country. More worryingly, it could lead to sort of an authoritarian sliding in countries that are less robustly democratic, and even countries that are pretty democratic, they might still worry about how it will shift power between different groups. And that’s another issue area, which again is, the stakes are tremendous, but we’re not invoking sort of radical advances in AI to get there.

And there’s actually some more that we could talk about, such as strategic stability, but I’ll skip it. Those are sort of all the challenges from near-term AI — AI as we see it today or likely it’s going to be coming in five years. But AI’s developing quickly, and we really don’t know how far it could go, how quickly. And so it’s important to also think about surprises. Where might we be in 10, 15, 20 years? And this is obviously very difficult, but I think, as you’ve mentioned, because it’s moving so quickly, it’s important that some people, scholars and policymakers, are looking down the tree a little bit farther to try to anticipate what might be coming and what we could do today to steer in a better direction.

So, at the Governance of AI Program, we work on every aspect of the development and deployment and regulation and norms around AI that we see as bearing on the highest stakes issues. And this document that you mentioned, it’s entitled AI Governance: A Research Agenda, is an attempt to articulate the space of issues that people could be working on that we see as potentially touching on these high stakes issues.

Ariel: One area that I don’t think you mentioned that I would like to ask about is the idea of an AI race. Why is that a problem, and what can we do to try to prevent an AI race from happening?

Allan: There’s this phenomenon that we might call the AI race, which has many layers and many actors, and this is the phenomenon where actors (those could be an AI researcher, they could be a lab, they could be a firm, they could be a country or even a region like Europe) perceive that they need to work really hard, invest resources, and move quickly to gain an advantage in AI — in AI capabilities, in AI innovations, deploying AI systems, entering a market — because if they don’t, they will lose out on something important to them. So, that could be, for the researchers, it could be prestige, right? “I won’t get the publication.” For firms it could be both prestige and maybe financial support. It could be a market. You might capture or fail to capture a really important market.

And then for countries, there’s a whole host of motivations. Everything from making sure there’s industries in our country for our workers to having companies that pay tax revenue so that the idea is if we have an AI champion, then we will have more taxable revenue but also other advantages. There’ll be more employment. Maybe we can have a good relationship with that champion and that will help us in other policy domains. And then, of course, there’s the military considerations that if AI becomes an important complement to other military technologies or even crucial tech in itself, then countries are often worried about falling behind and being inferior and always looking towards what might be the next source of advantage. So, that’s another driver for this sense that countries want to not fall behind and get ahead.

Jessica: We’re seeing competing interests at the moment. There are nationalistic kinds of tendencies coming up. We’re seeing national strategies emerging from all over the world, and there’s really strong economic and military motivations for countries to take this kind of stance. We’ve got Russian President Vladimir Putin telling students that whoever leads artificial intelligence will be the ruler of the world. We’ve got China declaring a national policy that they intend to be the global leader in AI by 2030, and other countries as well. Trump has said that he intends for the U.S. to be the global leader. The U.K. has said similar things.

So, there’s a lot of that kind of rhetoric coming from nations at the moment, and they do have economic and military motivations to say that. They’re competing for a relatively small number of AI researchers and a restricted talent pool, and everybody’s searching for that competitive advantage. That being said, as we see AI develop, particularly from more narrow applications to potential more generalized ones, the need for international cooperation, as well as more robust safety and reliability controls, are really going to increase, and so I think there are some emerging signs of international efforts that are really important to look to, and hopefully we’ll see that outweigh some of the competitive race dynamics that we’re seeing now.

Allan: The sort of crux of the problem is if everyone’s driving to achieve this performance achievement, they want to have the next most powerful system, and if there’s any other value that they might care about or society might care about, that’s sort of in the way or that there’s a trade-off. They have an incentive to trade away some of that value to gain a performance lead. Things that we see today, like privacy, so maybe countries that have a stricter privacy policy may have troubles generating an AI champion. Some look to China and see that maybe China has an AI advantage because it has such a cohesive national culture and close relationship between government and the private sector, as compared with, say, the United States, where you can see a real conflict at times between, say, Alphabet and parts of the U.S. government, which I think the petition around Project Maven really illustrates.

So, values you might lose include privacy or maybe not developing autonomous weapons, according to some ethical guidelines that you would want. There’s other concerns that put people’s lives at stake, so if you’re rushing to market with a self-driving car that isn’t sufficiently safe, then people can die. And the small numbers, they’re independent risks, but if say the risk that you’re deploying is that the self-driving car system itself is hackable at scale, then you might be generating a new weapon of mass destruction. So, there’s these accident risks or malicious use risks that are pretty serious, and then when you really start looking towards AI systems that would be very intelligent, hard for us to understand because they’re sort of opaque, complex, fast moving when they’re plugged into financial systems, energy grids, cyber systems, cyber defense, there’s an increasing risk that we won’t even know what risks we’re exposing ourselves to because of these highly complex interdependent, fast-moving systems.

And so if we could sort of all take a breath and reflect a little bit, that might be more optimal from everyone’s perspective. But because there’s this perception of a prize to be had, it seems likely that we are going to be moving more quickly than is optimal. It’s a very big challenge. It won’t be easily solved, but in my view, it is the most important issue for us to be thinking about and working towards over the coming decades, and if we solve it, I think we’re much more likely to develop beneficial advanced AI, which will help us solve all our other problems. So I really see this as the global issue of our era to work on.

Ariel: We sort of got into this a little bit earlier, but what are some of the other countries that have policies that you think maybe more countries should be implementing? And maybe more specifically, if you could speak about some of the international efforts that have been going on.

Jessica: Yeah, so an interesting thing we’re seeing from the U.K. is that they’ve established a center for data ethics and innovation, and they’re really making an effort to prioritize ethical considerations of AI. So I think it remains to be seen exactly what that looks like, but that’s an important element to keep in mind. Another interesting thing to watch, Estonia is working on an AI law at the moment, so they’re trying to make very clear guidelines so that when companies come in and they want to work on AI technology in that country, they know exactly what the framework they’re working in will be like, and they actually see that as something that can help encourage innovations. I think that’ll be a really important one to watch, as well.

But there’s a lot of great work happening. There’s task forces emerging, and not just at the federal level, at the local level, too. New York now has an algorithm monitoring task force and actually trying to see where are algorithms being used in public services and trying to encourage accountability about where those exist, so that’s a really important thing that potentially could spread to other states or other countries.

And then you mentioned international developments, as well. So, there are important things happening here. The E.U. is certainly a great example of this right now. 25 European countries signed a Declaration of Cooperation on AI. This is a plan, a strategy to actually work together to improve research and work collectively on the kind of social and security and legal issues that come up around AI. There’s also, at the G7 meeting, they signed, it’s called the Charlevoix Common Vision for the Future of AI. That again, it’s not regulatory, but setting out a vision that includes things like promoting human-centric AI and fostering public trust, supporting lifelong learning and training, as well as supporting women and underrepresented populations in AI development. So, those kinds of things, I think, are really encouraging.

Ariel: Excellent. And was there anything else that you think is important to add that we didn’t get a chance to discuss today?

Jessica: Just a couple things. There are important ways that government can shape the trajectory of AI that aren’t just about regulation. For example, deciding how to leverage government investment really changes the trajectory of what AI is developed, what kinds of systems people prioritize. That’s a really important policy lever that is different from regulation that we should keep in mind. Another one is around procurement standards. So, when governments want to bring AI technologies into government services, what are they going to be looking for? What are the best practices that they require for that? So, those are important levers.

Another issue just is somewhat taken for granted in this conversation but just to state it, is that, shaping AI for a safe and beneficial future is, we can’t just have technical fixes; these are really built by people, and we’re making choices about how and where they’re deployed and for what purposes, so these are social and political choices. This has to be a multidisciplinary process, and involve governments along with industry and civil society, so really encouraging to see these kinds of conversations take place.

Ariel: Awesome. I think that’s a really nice note to end on. Well, so Jessica and Allan, thank you so much for joining us today.

Allan: Thank you, Ariel, it was a real pleasure. And Jessica, it was a pleasure to chat with you. And thank you to all the good work coming out of FLI promoting beneficial AI.

Jessica: Yeah, thank you so much, Ariel, and thank you Allan, it’s really an honor to be part of this conversation.

Allan: Likewise.Ariel: If you’ve been enjoying the podcasts, please take a moment to like them, share them, follow us on whatever platform you’re listening to us on. And, I will be back again next month, with a new pair of experts.

Governing AI: An Inside Look at the Quest to Ensure AI Benefits Humanity

Click here to see this page in other languages:  Russian 

Finance, education, medicine, programming, the arts — artificial intelligence is set to disrupt nearly every sector of our society. Governments and policy experts have started to realize that, in order to prepare for this future, in order to minimize the risks and ensure that AI benefits humanity, we need to start planning for the arrival of advanced AI systems today.

Although we are still in the early moments of this movement, the landscape looks promising. Several nations and independent firms have already started to strategize and develop polices for the governance of AI. Last year, the UAE appointed the world’s first Minister of Artificial Intelligence, and Germany took smaller, but similar, steps in 2017, when the Ethics Commission at the German Ministry of Transport and Digital Infrastructure developed the world’s first set of regulatory guidelines for automated and connected driving.

This work is notable; however, these efforts have yet to coalesce into a larger governance framework that extends beyond national boundaries. Nick Bostrom’s Strategic Artificial Intelligence Research Center seeks to assist in resolving this issue by understanding, and ultimately shaping, the strategic landscape of long-term AI development on a global scale.


Developing a Global Strategy: Where We Are Today

The Strategic Artificial Intelligence Research Center was founded in 2015 with the knowledge that, to truly circumvent the threats posed by AI, the world needs a concerted effort focused on tackling unsolved problems related to AI policy and development. The Governance of AI Program (GovAI), co-directed by Bostrom and Allan Dafoe, is the primary research program that has evolved from this center. Its central mission, as articulated by the directors, is to “examine the political, economic, military, governance, and ethical dimensions of how humanity can best navigate the transition to such advanced AI systems.” In this respect, the program is focused on strategy — on shaping the social, political, and governmental systems that influence AI research and development — as opposed to focusing on the technical hurdles that must be overcome in order to create and program safe AI.

To develop a sound AI strategy, the program works with social scientists, politicians, corporate leaders, and artificial intelligence/machine learning engineers to address questions of how we should approach the challenge of governing artificial intelligence. In a recent 80,0000 Hours podcast with Rob Wiblin, Dafoe outlined how the team’s research shapes up from a practical standpoint, asserting that the work focuses on answering questions that fall under three primary categories:

  • The Technical Landscape: This category seeks to answer all the questions that are related to research trends in the field of AI with the aim of understanding what future technological trajectories are plausible and how these trajectories affect the challenges of governing advanced AI systems.
  • AI Politics: This category focuses on questions that are related to the dynamics of different groups, corporations, and governments pursuing their own interests in relation to AI, and it seeks to understand what risks might arise as a result and how we may be able to mitigate these risks.
  • AI Governance: This category examines positive visions of a future in which humanity coordinates to govern advanced AI in a safe and robust manner. This raises questions such as how this framework should operate and what values we would want to encode in a governance regime.

The above categories provide a clearer way of understanding the various objectives of those invested in researching AI governance and strategy; however, these categories are fairly large in scope. To help elucidate the work they are performing, Jade Leung, a researcher with GovAI and a DPhil candidate in International Relations at the University of Oxford, outlined some of the specific workstreams that the team is currently pursuing.

One of the most intriguing areas of research is the Chinese AI Strategy workstream. This line of research examines things like China’s AI capabilities vis-à-vis other countries, official documentation regarding China’s AI policy, and the various power dynamics at play in the nation with an aim of understanding, as Leung summarizes, “China’s ambition to become an AI superpower and the state of Chinese thinking on safety, cooperation, and AGI.” Ultimately, GovAI seeks to outline the key features of China’s AI strategy in order to understand one of the most important actors in AI governance. The program published Deciphering China’s AI Dream in March of 2018a report that analyzes new features of China’s national AI strategy, and has plans to build upon research in the near future.

Another workstream is Firm-Government Cooperation, which examines the role that private firms play in relation to the development of advanced AI and how these players are likely to interact with national governments. In a recent talk at EA Global San Francisco, Leung focused on how private industry is already playing a significant role in AI development and why, when considering how to govern AI, private players must be included in strategy considerations as a vital part of the equation. The description of the talk succinctly summarizes the key focal areas, noting that “private firms are the only prominent actors that have expressed ambitions to develop AGI, and lead at the cutting edge of advanced AI research. It is therefore critical to consider how these private firms should be involved in the future of AI governance.”

Other work that Leung highlighted includes modeling technology race dynamics and analyzing the distribution of AI talent and hardware globally.


The Road Ahead

When asked how much confidence she has that AI researchers will ultimately coalesce and be successful in their attempts to shape the landscape of long-term AI development internationally, Leung was cautious with her response, noting that far more hands are needed. “There is certainly a greater need for more researchers to be tackling these questions. As a research area as well as an area of policy action, long-term safe and robust AI governance remains a neglected mission,” she said.

Additionally, Leung noted that, at this juncture, although some concrete research is already underway, a lot of the work is focused on framing issues related to AI governance and, in so doing, revealing the various avenues in need of research. As a result, the team doesn’t yet have concrete recommendations for specific actions governing bodies should commit to, as further foundational analysis is needed. “We don’t have sufficiently robust and concrete policy recommendations for the near term as it stands, given the degrees of uncertainty around this problem,” she said.

However, both Leung and Defoe are optimistic and assert that this information gap will likely change — and rapidly. Researchers across disciplines are increasingly becoming aware of the significance of this topic, and as more individuals begin researching and participating in this community, the various avenues of research will become more focused. “In two years, we’ll probably have a much more substantial research community. But today, we’re just figuring out what are the most important and tractable problems and how we can best recruit to work on those problems,” Dafoe told Wiblin.

The assurances that a more robust community will likely form soon are encouraging; however, questions remain regarding whether this community will come together with enough time to develop a solid governance framework. As Dafoe notes, we have never witnessed an intelligence explosion before, so we have no examples to look to for guidance when attempting to develop projections and timelines regarding when we will have advanced AI systems.

Ultimately, the lack of projections is precisely why we must significantly invest in AI strategy research in the immediate future. As Bostrom notes in Superintelligence: Paths, Dangers, and Strategies, AI is not simply a disruptive technology, it is likely the most disruptive technology humanity will ever encounter: “ is quite possibly the most important and most daunting challenge humanity has ever faced. And — whether we succeed or fail — it is probably the last challenge we will ever face.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Edit: The title of the article has been changed to reflect the fact that this is not about regulating AI.

AI Alignment Podcast: The Metaethics of Joy, Suffering, and Artificial Intelligence with Brian Tomasik and David Pearce

What role does metaethics play in AI alignment and safety? How might paths to AI alignment change given different metaethical views? How do issues in moral epistemology, motivation, and justification affect value alignment? What might be the metaphysical status of suffering and pleasure?  What’s the difference between moral realism and anti-realism and how is each view grounded?  And just what does any of this really have to do with AI?

The Metaethics of Joy, Suffering, and AI Alignment is the fourth podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, currently rebranded Humanity+. You might know him for his work on The Hedonistic Imperative, a book focusing on our moral obligation to work towards the abolition of suffering in all sentient life. Brian is a researcher at the Foundational Research Institute. He writes about ethics, animal welfare, and future scenarios on his website “Essays On Reducing Suffering.” 

Topics discussed in this episode include:

  • What metaethics is and how it ties into AI alignment or not
  • Brian and David’s ethics and metaethics
  • Moral realism vs antirealism
  • Emotivism
  • Moral epistemology and motivation
  • Different paths to and effects on AI alignment given different metaethics
  • Moral status of hedonic tones vs preferences
  • Can we make moral progress and what would this mean?
  • Moving forward given moral uncertainty
In this interview we discuss ideas contained in the work of Brian Tomasik and David Pearce. You can learn more about Brian’s work here and here, and David’s work hereYou can hear more in the podcast above or read the transcript below.

Lucas: Hey, everyone. Welcome back to the AI Alignment podcast series with the Future of Life Institute. Today, we’ll be speaking with David Pearce and Brian Tomasik. David is a co-founder of the World Transhumanist Association, rebranded humanity plus, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention.

Brian Tomasik writes about ethics, animal welfare and for far-future scenarios from a suffering-focused perspective on his website He has also helped found the Foundational Research Institute, which is a think tank that explores crucial considerations for reducing suffering in the long term future. If you have been finding this podcast interesting or useful, remember to follow us on your preferred listening platform and share the episode on social media. Today, Brian, David, and I speak about metaethics, key concepts and ideas in the space, explore the metaethics of Brian and David, and how this all relates to and is important for AI alignment. This was a super fun and interesting episode and I hope that you find it valuable. With that, I give you Brian Tomasik and David Pearce.

Thank you so much for coming on the podcast.

David: Thank you Lucas.

Brian: Glad to be here.

Lucas: Great. We can start off with you David and then, you Brian and just giving a little bit about your background, the intellectual journey that you’ve been on and how that brought you here today.

David: Yes. My focus has always been on the problem of suffering, very ancient problem, Buddhism and countless other traditions preoccupied by the problem of suffering. I’m also a transhumanist and what transhumanism brings to the problem is suffering is the idea that it’s possible to use technology, in particular biotechnology to phase out suffering, not just in humans throughout the living world and ideally replace them by gradients of intelligent wellbeing. Transhumanism is a very broad movement embracing not just radical mood enrichment but also super longevity and super intelligence. This is what brings me in and us here today in that there is no guarantee that human preoccupations are the problems of suffering are going to overlap with those of post human super intelligence.

Lucas: Awesome, and so you, Brian.

Brian: I’ve been interested in utilitarianism since I was 18 and I discovered the word. I immediately looked it up and was interested to see that the philosophy mirrored some of the things that I had been thinking about up to that point. I became interested in animal ethics and the far future. A year after that, I actually discovered David’s writings of the Hedonistic Imperative, along with other factors. His writings helped to inspire me to care more about suffering relative to the creation of happiness. Since then, I’ve been what you might call suffering-focused, which means I think that the reduction of suffering has more moral priority than other values. I’ve written about both animal ethics including wild animal suffering as well as risks of astronomical future suffering, what are called s-risks. You had a recent podcast episode with Kaj Sotala to talk about s-risks.

I, in general think that from my perspective, one important thing to think about was during AI is what sorts of outcomes could result in large amounts of suffering? We should try to steer away from those possible future scenarios.

Lucas: Given our focuses on AI alignment, I’d like to just offer a little bit of context. Today, this episode will be focusing on ethics. The AI Alignment problem is traditionally seen as something which is prominently something technical. While a large, large portion of it is technical, the end towards which the technical AI is aimed or the ethics which is imbued within it or embodied within it is still an open and difficult question. Broadly, just to have everything defined here, we can understand ethics here just a method of seeking to understand what we ought to do and what counts as moral or good.

The end goal of AI safety is to create beneficial intelligence not undirected intelligence. What beneficial exactly entails is still an open question that largely exist in the domain of ethics. Even if all the technical issues surrounding the creation of an artificial general intelligence or super intelligence are solved, we will still face deeply challenging ethical questions that will have tremendous consequences for earth-originating intelligent life. This is what is meant when it is said that we must do philosophy or ethics on a deadline. In the spirit of that, that’s why we’re going to be focusing this podcast today on metaethics and particularly the metaethics of David Pearce and Brian Tomasik, which also happen to be ethical views which are popular I would say among people interested in the AI safety community.

I think that Brian and David have enough disagreements that this should be pretty interesting. Again, just going back to this idea of ethics, I think given this goal, ethics can be seen as a lens through which to view safe AI design. It’s also a cognitive architecture to potentially be instantiated in AI through machine ethics. That would potentially make AIs ethical reasoners, ethical decision-makers, or both. Ethics can also be developed, practiced and embodied by AI researchers and their collaborators, and can also be seen as a discipline through which we can guide AI research and adjudicate it’s impacts in the world.

There is an ongoing debate about what the best path forward is for generating ethical AI, whether it’s project of machine ethics through bottom up or for top down approaches, or just a broad project of AI safety and AI safety engineering where we seek out corrigibility and docility, and alignment, and security in machine systems or probably even some combination of the two. It’s unclear what the outcome of AI will be but what is more certain though is that AI promises to produce and make relevant both age-old and novel moral considerations through areas such as algorithmic bias and technological disemployment and autonomous weapons, and privacy, big data systems, and even possible phenomenal states in machines.

We’ll even see new ethical issues with what might potentially one day be super intelligence and beyond. Given this, I think I’d like to just dive in first with you Brian and then, with you David. If you could just get into what the foundation is of your moral view? Then, afterwards, we can dive into the metaethics behind it.

Brian: Sure. At bottom, the reason that I placed foremost priority on suffering is emotion. Basically, the emotional experience of having suffered myself intensely from time to time and having empathy when I see others suffering intensely. That experience of either feeling it yourself or seeing others in extreme pain carries just a moral valence to me or a spiritual sensation you might call it that seems different from the sensation that I feel from anything else. It seems just obvious at an emotional level that say torture or being eaten alive by a predatory animal or things of that nature have more moral urgency than anything else. That’s the fundamental basis. You can also try to make theoretical arguments to come to the same conclusion. For example, people have tried to advance what’s called the asymmetry, which is the intuition that it’s bad to create a new being who will suffer a lot but it’s not wrong to fail to create a being that will be happy or at least not nearly as wrong.

From that perspective, you might care more about preventing the creation of suffering beings than about creating additional happy beings. You can also advance the idea that maybe preferences are always a negative debt that has to be repaid. Maybe when you have a preference that’s a bad thing and then, it’s only by fulfilling the preference that you erase the bad things. This would be similar to the way in which Buddhism says that suffering arises from craving. The goal is to cease the cravings which can be done either through the fulfilling the cravings, giving the organism what the organism wants or not having the cravings in the first place. Those are some potential theoretical frameworks from which to also derive a suffering-focused ethical view. For me personally, the emotional feeling is the most important basis.

David: I would very much like to echo what Brian was saying there. I mean there is something about the nature of intense suffering. One can’t communicate it to someone who hasn’t suffered. I mean someone who is for example born with congenital anesthesia or insensitivity to pain but there is something that is self-intimatingly nasty and disvaluable about suffering. However, evolution hasn’t engineered us of course to care impartially about the suffering of all sentient beings. My suffering and those of my genetic kin tends to matter far more to me than anything else. So far as we aspire to become transhuman and posthuman, we should be aspiring to this godlike perspective that takes into account the suffering of all sentient beings that the egocentric illusionist is a genetically adaptive lie.

How does this tie in to the question of posthuman super intelligence? Of course, there are very different conceptions of what posthuman super intelligence is going to be. I’ve always had what might say a more traditional conception of super intelligence in which posthuman super intelligence is going to be our biological descendants enhanced by AI but nonetheless still our descendants. However, there are what might crudely be called two other conceptions of post human super intelligence. One is this Kurzweilian fusion of humans and our machines, such that the difference between humans and our machine ceases to be relevant.

There’s another conception of super intelligence that you might say in some ways is the most radical is the intelligence explosion that was first conceived by I.J. Good but has been developed by Eliezer Yudkowsky, MIRI, and most recently by Nick Bostrom that conceives of some kind of runaway explosion, recursively self-improving AI and yes, there being no guarantee that the upshot of this intelligence explosion is going to be in any way congenial to human values as we understand them. I’m personally skeptic about the intelligence explosion in this sense but yeah, it’s worth clarifying what one means by posthuman super intelligence.

Lucas: Wonderful. Right before we dive into the metaethics behind these views and their potential relationship with AI alignment and just broadening the discussion to include ethics and exploring some of these key terms. I just like to touch on the main branches of ethics to provide some context and mapping for us. Generally, ethics is understood to have three branches, those being metaethics, normative ethics, and applied ethics. Traditionally, applied ethics is viewed as the application of normative and metaethical views to specific cases and situations to determine the moral status of said case or situation in order to decide what ought to be done.

An example of that might be applying one’s moral views to factory farming to determine whether or not it is okay to factory farm animals for their meat. The next branch moving upwards in abstraction would be normative ethics, which examines and deconstructs or constructs the principles and ethical systems we use for assessing the moral worth and permissibility of specific actions and situations. This branch is traditionally viewed as the formal ethical structures that we apply to certain situations and people are familiar with the deontological ethics and consequentialism, or utilitarianism, or virtue ethics. These are all normative ethical systems.

What we’ll be discussing today is primarily metaethics. metaethics seeks to understand morality and ethics itself. It seeks to understand the nature of ethical statements, attitudes, motivation, properties and judgments. It seeks to understand whether or not ethics relates to objective truths about the world and about people, or whether it’s just simply subjective or if all ethical statements are in fact false. Seeks to understand when people mean when they express ethical judgments or statements. This gets into things like ethical uncertainty and justification theories, and substantial theories, and semantic theories of ethics.

Obviously, these are all the intricacies of the end towards which AI maybe aimed. Given even the epistemology of metaethics and ethics in general that also have major implications for what AIs might be able to discover about ethics or what they may not be able to discover about ethics. Again today, we’ll just be focusing on metaethics and the metaethics behind David and Brian’s views. I guess just to structure this a little bit, just to really start to use the formal language of metaethics. As a little bit of background again, semantic theories is an ethics seek to address the question of what is the linguistic meaning of moral terms or judgments.

These are primarily concerned with whether or not moral statements contain truth values or are arbitrary and subjective. There are other branches within semantic theories but there are main two branches. The first of that is noncognitivism. Noncognitivism refers to a group of theories which hold that moral statements are neither true nor false because they do not express genuine propositions. Usually, these forms of noncognitive views with things like emotivism where people think that when people are expressing our moral views or attitudes like suffering is wrong, they’re simply saying an emotion like boohoo it’s a suffering. Or I’m expressing the emotion that I think that suffering merely bothers me or is bad to me. Rather than you expressing some sort of truth or false claim about the world. Standing in contrast to noncognitivism is just cognitivism, which refers to a set of theories which hold that moral sentences express genuine propositions. That means that they can have truth of false values.

This is to say that they are capable of being true or false. Turning back to Brian and David’s views, how would you each view your moral positions as you’ve expressed thus far. Would you hold yourself to a cognitivist view or a noncognitivist view. I guess we can start with you David.

David: Yes. I just say it’s just built into the nature of let’s say agony that agony is disvaluable. Now, you might say that there is nothing in the equations of physics and science that says anything over and above the experience itself, something like redness. Yeah, redness is subjective. It’s mind-dependent. Yet, unless one thinks minds don’t exist in the physical universe. Nonetheless, redness is an objective feature of the natural physical world. I would say that for reasons we simply don’t understand, pleasure-pain axis discloses the world’s inbuilt metric of value and disvalue. It’s not an open question whether something like agony is disvaluable to the victim.

Now, of course, someone might say, “Well, yes. Agony is disvaluable to you but it’s not disvaluable to me.” I would say that this reflects an epistemological limitation and that in so far as you can access what it is like to be me and I’m in agony, then you will appreciate why agony is objectively disvaluable.

Lucas: Right. The view here is a cognitivist view where you think that it is true to say that there is some intrinsic property or quality to suffering or joy that makes it I guess analytically true that it is valuable or disvaluable.

David: Yes. Well, it has to be very careful about using something like analytically because yeah, someone says that god is talking to me and it is analytically true that these voices are the voices of god. Yeah, one needs to be careful not to smuggle in too much. It is indeed very mysterious. What could be this hybrid descriptive evaluative state of finding something valuable or disvaluable. The intrinsic nature of the physical is very much an open question. I think there are good powerful reasons for thinking that the reality is exhaustively described by the equations of physics. The intrinsic nature of that stuff, the essence of the physical, the fire in the equations is controversial. Physics itself is silent.

Lucas: Right. I guess here, you would describe yourself given these views as a moral realist or an objectivist.

David: Yes, yes.

Brian: Just to jump in before we get to me. Couldn’t you say that your view is still based on mind-dependence because at least based on the thing about if somebody else were hooked up to you, that person would appreciate the badness of suffering. That’s still just dependent on that other mind’s judgment or even if you have somebody who could mind meld with the whole universe and experience all suffering at once. That would still be the dependence of that mind. That mind is judging it to be a bad thing. Isn’t it still mind-depending ultimately?

David: Mind-dependent but I would say that minds are features of the physical world and so, obviously one can argue for some kind of dualism but I’m monistic physicalist at least that’s my working assumption.

Brian: I think objective moral value usually … the definition is usually that it’s not mind-dependent. Although, maybe it just depends what definition we’re using.

David: Yes. It’s rather like something physicalism, it’s often used as a stylistic variant of materialism. One can be non-materialist physicalist and idealist. As I said, minds are objective features of the physical world. I mean at least tentatively at any rate taks seriously the idea that our experience discloses the intrinsic nature of the physical. This is obviously controversial opinion. It’s associated with someone like Galen Straussen or more likely Phil Goff but it stretches back via Grover Maxwell and Russell, ultimately to Schopenhauer. A much more conventional view of course would be that the intrinsic nature of the physical, the fire and the equations is non-experiential. Then, at sometime during the late pre-Cambrian, something happened. Not just organizational but ontological eruption into the fabric of the world first person experience.

Lucas: Just to echo what Brian was saying. The traditional objectivist or more realist view is that the way in which science is the project of interrogating third person facts like what is simply true about the person regardless of what we think about it. In some ways, I think that traditionally the moral realist view is that if morality deals with objective facts, then, these facts are third person objectively true and can be discovered through the methods and tools of ethics. In the same way that someone who might be a mathematical realist would say that one does not invent certain geometric objects rather one discovers them through the application of mathematical reasoning and logic.

David: Yes. I think it’s very tempting to think of first person facts as having some kind of second rate ontological status but as far as I’m concerned, first person facts are real. If someone is in agony or experiencing redness, these are objective tracks about the physical world.

Lucas: Brian, would you just like to jump in with the metaethics behind your own view that you discussed earlier?

Brian: Sure. On cognitivism versus noncognitivism, I don’t have strong opinions because I think some of the debate is just about how people use language, which is not a metaphysical fundamental issue. It’s just like however humans happen to use language. I think the answer to the cognitivism, noncognitivism, if I had to say something would be it’s messy probably. Humans do talk about moral statements, the way they talk about other statements, other factual statements. We use reasoning and we care about maintaining logical consistency among sets of moral statements. We treat them as regular factual statements in that regard. There maybe also be a sense in which moral statements do strongly express certain emotions. I think probably most people don’t really think about it too much.

It’s like people know what they mean when they use moral statements and they don’t have a strong theory of exactly how to describe what they mean. One analogy that you could use is I think moral statements are like swear words. They’re used to make people feel more strongly about something or express how strongly you feel about something. People think that they don’t just refer to one’s emotions and even at a subjective level. If you say my moral view is suffering as bad. That feels different than saying I like ice cream because there’s a deeper, more spiritual or more like fundamental sensation that comes along with the moral statements that doesn’t come along with the, “I like ice cream,” statements.

I think metaphysically, that doesn’t reflect anything fundamental. It just means that we feel differently about moral statements and thoughts than about nonmoral ones. Subjectively, it feels different. Yeah. I think most people just feel that difference and then, exactly how you cash out whether that’s cognitive or noncognitive is a semantic dispute. My metaphysical position is anti-realism. I think that moral statements are mind-dependent. They reflect ultimately our own preferences even if they maybe very spiritual and like deep fundamental preferences. I think Occam’s Razor favors this view because it would add complexity to the world for there to be independent truths. I’m not even sure what that would mean, based on similar reason, I reject mathematical truths and anything non-physicalist. I think moral truths, mathematical truths and so on can all be thought of as fictional constructions that we make. We can reason within these fictional universes of ethics and mathematics that we construct using physical thought processes. That’s my basic metaphysical stance.

Lucas: Just stepping back to the cognitivism and noncognitivism issue, I guess I was specifically interested in yourself. When you were expressing your own moral view earlier, did you find that it’s simply a mixture of expressing your own emotions and also, trying to express truth claims or given your anti-realism, do you think that you’re simply only expressing emotions when you’re conveying your moral view?

Brian: I think very much of myself as an emotivist. It’s very clear to me that what I’m doing when I do ethics is what the emotivist as people are doing. Yes, since I don’t believe in moral truth, it would not make sense for me to be gesturing at moral truths. Except maybe in so far as my low level brain wiring intuitively thinks in those terms.

David: Just to add to this and that although it is possible to imagine, say something you like spectrum inversion, color inversion, some people who like ice cream and some people who hate ice cream. One thing it isn’t possible to do is imagine a civilization in which an inverted pleasure-pain axis. It seems to just be a basic fact about the world that unbearable, agony and despair is experienced as disvaluable and even cases that might appear to contradict this slight that say that masochist are in fact merely confirm a claim because, yeah, I mean the masochist enjoys the intensity rewarding release of endogenous opioids when the masochist undergoes activities that might otherwise be humiliating or painful.

Lucas: Right. David, it seems you’re making a claim about there being a perfect convergence in the space of all possible minds among the pleasure-pain axis having the same sort of function. I guess I’m potentially just missing the gap or pointing out the gap between that and I guess your cognitivist objectivism?

David: It seems to be built into the nature of let’s say agony or despair itself that it is disvaluable. It’s not I’m in agony. Is this valuable or not? It’s not open question whereas anything else. However, abhorrent, your eye might regard it one can still treat it as an open question and ask, is child abuse or slavery really disvaluable? Whereas in the case of agony, it’s built in the nature of the experience itself.

Lucas: I can get behind that. I think that sometimes when I’m feeling less nihilistic about morality, I am committed to that view. I think just to push back a little bit here. I think in the space of all possible minds, I think I can imagine a mind which has a moral judgment and commitment to the maximization of suffering within itself and within the world. It’s simply … it’s perfect in that sense. It’s perfect in maximizing suffering for itself in the world and it’s judgment and moral epistemology is very brittle, such that it will never change or deviate from this. How would you deal with something like that?

David: Is it possible? I mean one can certainly imagine a culture in which displays of machismo and the ability to cope with great suffering are highly valued and would be conspicuously displayed. This would fitness enhancing but nonetheless, it doesn’t really challenge the sovereignty of their pleasure-pain axis as the axis of value and disvalue. Yeah, I would struggle to conceive some kind of intelligence that values its own despair or agony.

Brian: From my perspective, I agree with what Lucas is saying depending on how you define things. One definition of suffering could be that part of the definition is desire to avoid it. From that perspective, you could say it’s not possible for an agent to seek something that it avoids. I think you could have systems where there are different parts in conflict so you could a hedonic assessment system that outputs a signal that this is suffering but then, another system then chooses to favor the suffering. Humans even have something like this when we can override our own suffering. We might have hedonic systems that say going out in the cold is painful but then, we have other systems or other signals that override that avoidance response and cause us to go out in the cold anyway for the sake of something else. You could imagine the wiring, such that wasn’t just enduring pain for some greater good but the motivational system was actively seeking to cause the hedonic system more experiences of pain. It’s just that that would be highly nonadaptive so we don’t see that anywhere in nature.

David: I would agree with what Brian says there. Yes, very much so.

Lucas: Okay. Given these views, would you guys have expressed and starting to get a better sense of them. Another branch of metaethics here that we might be able to explore how it fits in with your guy’s theories, justification theories within metaethics. These are attempts at understanding moral epistemology and motivation for acting in accordance with morality. It attempts to answer the question of how are moral judgments to be supported or defended? If possible, how does one make moral progress? This again will include moral epistemology and in terms of AI and value alignment, if one is anti-realist as Brian is or if one is an objectivist as David is then this completely changes the way and path forward towards AI alignment and value alignment if we are realist as David is then a sufficiently robust and correct moral epistemology in an AI system could essentially realize the hedonistic imperative as David sees it, where you would just have an optimization process extending out from planet earth, which was maximizing for the objectively good hedonic states in all possible sentient beings. I guess it’s a little unclear for me how this fits in with David’s theory or how David’s theory would be implemented.

David: There is a real problem with any theory of value that makes sovereign either the minimization of suffering or classical utilitarianism. Both Buddhism and negative utilitarianism appear to have this apocalyptic implication that if overriding responsibilities to minimize suffering but no. Isn’t that cleanest, quickest, efficient way to eliminate suffering to sterilize the planet, which is now technically feasible and though one can in theory imagine cosmic rescue missions if there is sentence elsewhere. There is apparently this not so disguised apocalyptic implication. When Buddha says allegedly or hopefully I teach one thing and one thing only. Suffering and the relief of suffering, or the end of suffering, yeah, in his day, there was no way to destroy the world. Today, there is.

Much less discussed, indeed I haven’t seen it adequately or not discussed at all in the scholarly literature is that a disguised implication of a classical utilitarian ethic that gives this symmetry to pleasure and pain is that we ought to be launching something like utilitronium shockwave where utilitronium is matter and energy optimized for pure bliss. The shockwave alludes to its velocity of propagation. Though humans perhaps are extremely unlikely even if and when we’re in a position to do so to launch a utilitronium shockwave. If one imagines a notional artificial, super intelligent with a utility function of classical utilitarianism, why wouldn’t that super intelligent launch a utilitronium shockwave that maximizes the cosmic abundance of positive value within our cosmological horizon.

Personally, I would imagine a future of gradients of intelligent bliss. I think that is in fact sociologically highly likely that post-human civilization will have a hedonic range that’s very crudely and schematically as is minus 10 to zero, to plus 10. I can imagine future civilization of let’s say plus 70 to plus 100 or plus 90 to a plus 100. From the perspective classical utilitarianism and classical utilitarianism is arguably the dominant some kind of watered-down version at least is the dominant secular ethic, and academia and elsewhere. That kind of civilization is suboptimal. It’s not moral or apparently has this obligation to launch this kind of cosmic orgasm so to speak.

Lucas: Right. I mean I think just pushing a little bit back on the first thing that you said there about the very negative scenario, which I think people tend to see as an implication of a suffering reducing focused ethic where there can’t be any suffering if there’s no sentient beings. That to me isn’t very plausible because it discounts the possibility of future wellbeing. I take the view that we actually do have a moral responsibility to create more happy beings and I view a  symmetry between pain and suffering. I don’t have a particularly suffering-focused ethic where I think there’s asymmetry where I think we should alleviate suffering prior to maximizing wellbeing. I guess David, maybe you can just unpack a little bit before we jump into these justification theories about whether or not you view there as being asymmetry between suffering and wellbeing.

David: I think there’s an asymmetry. There’s this fable of Ursula Le Guin, short story, Ones Who Walk Away From Omelas. We’re invited to imagine this city of delights, vast city of incredible wonderful pleasures but the existence of Omelas, this city of delights depends on the torment and abuse of a single child. The question is would you walk away from Omelas and what does walking away from Omelas entail. Now, personally I am someone who would walk away from Omelas. The world does not have an off switch, an off button and I think if one is whether a Buddhist of a negative utilitarian, or someone who believes in suffering-focused ethics, rather than to consider these theoretical apocalyptic scenarios it is more fruitful to work with secular and religious life lovers to phase out the biology of suffering in favor of gradients of intelligent wellbeing because one of the advantages of hedonic recalibration, i.e. ratcheting up hedonic set points is that it doesn’t ask people to give up their existing values and preferences with complications.

If you ask me, just convenient, this is a rather trivial example. Imagine, 100 people, 100 different football teams. There’s simply no way to reconcile conflicting preferences but what one can do if one ratchets up everyone’s hedonic set point is to improve quality of life. By focusing on ratcheting up hedonic set points rather than trying to reconcile the irreconcilable, I think this is the potential way forward.

Brian: There are a lot of different points to comment on. I agree with David that negative utilitarians should not aim for world destruction for several reasons. One being that it would be make people turn against the cause of suffering reduction. It’s important to have other people not regard that as something to be appalled by. For example, animal rights terrorists, plausibly give the animal rights movement a pretty bad name and may set back the cause of animal rights by doing that. Negative utilitarians would almost certainly not succeed anyway, so the most likely outcome is that they hurt their own cause.

As far as David’s suggestion of improving wellbeing to reduce disagreements among competing football teams, I think that would potentially help giving people greater wealth and equality in society can reduce some tensions. I think there will always be some insatiable appetites especially from moral theories. For example, classical utilitarian has an insatiable appetite for computational resources. Egoists and other moral people may have their own insatiable appetites. We see that in the case of humans trying to acquire wealth beyond what is necessary for their own happiness. I think there will always be those agents who want to acquire as many resources as possible. The power maximizers will tend to acquire power. I think we still have additional issues of coordination and social science being used to control the thirst for power among certain segments of society.

Lucas: Sorry. Just to get this clear. It sounds like you guys are both committed to different forms of hedonic consequentialism. You’re bringing up preferences and other sorts of things. Is there a room for ultimate metaphysical value of preferences within your ethics? Or are preferences simply epistemically and functionally useful indicators of what will often lead to positive hedonics and agents within you guys as ethical theories?

Brian: Personally, I care to some degree about both preferences and hedonic wellbeing. Currently, I care some more about hedonic wellbeing just based on … from my meta-ethical standpoint, it’s ultimately my choice, what I want to care about. I happen to care a lot about hedonic suffering when I imagine that. From a different standpoint, you can argue that ultimately the golden rule for example commits you to caring about whatever it is and other organisms cares about whether that’s hedonic wellbeing or some arbitrary wish. For example, a deathbed wish would be a good example of a preference that doesn’t have hedonic content to it, whether you think it’s important to keep deathbed wishes even after a person has died ignoring side effects in terms of later generations realizing that promises are not being kept.

I think even ignoring those side effects, a deathbed wish does have some moral importance based on the idea that if I had a deathbed wish, I would strongly want it to be carried out if you are acting the way you want others to treat you. Then, you should care to some degree about other people’s deathbed wishes. Since I’m more emotionally compelled by extreme hedonic pain, that’s what I give the most weight to.

Lucas: What would your view be of an AI or machine intelligence, which has a very strong preference, whatever that computational architecture might look like a bit be flip one way rather than another. It just keeps flipping a bit back and forth, and then, you would have a preference utilitronium shockwave going out in the world. It seems intuitive to me also that we only care about preferences and so far as they … I guess this previous example does this work for me is that we only care about preferences in so far as that they have hedonic effects. I’ll bite the bullet on the deathbed wish thing and I think that ignoring side effects like if someone wishes for something and then, they die, I don’t think that we need to actually carry it out if we don’t think it will maximize hedonic wellbeing.

Brian: Ignoring the side effects. There are probably good hedonistic reasons to fulfill deathbed wishes so that current people will not be afraid that their wishes won’t be kept also. As far as the bit flipping, I think a bit flipping agent does, I think it’s preference does have moral significance but I weigh organisms in proportion to the sophistication of their minds. I care more about a single human than a single ant for example because a human has more sophisticated cognitive machinery. It can do more kinds of … have more kinds of thoughts about its own mental states. When a human has a preference, there’s more stuff going on within its brain to back that up so to speak. A very simple computer program that has a very simple preference to flip a bit doesn’t matter very much to me because there’s not a lot of substance behind that preferences. You could think of it as an extremely simple mind.

Lucas: What if it’s a super intelligence that wants to keep flipping bits?

Brian: In that case, I would give a significant way because it has so much substance in its mind. It probably has lots of internal processes that are reflecting on its own welfare so to speak. Yeah, if it’s a very sophisticated mind, I would give that significant weight. It might not override the preferences of seven billion humans combined. I tend to give less than linear weight to larger brains. As the size of the brain increases, I don’t scale the moral weight of the organism exactly linearly. That would alter reduce that utility monster inclusion.

Lucas: Given Brian’s metaethics being an anti-realist and viewing him as an emotivist, I guess the reasons or arguments that you could provide against this view would only be, they don’t refer back to any metaphysical objective, anything really. David, wouldn’t you say that in the end, it would just be your personal emotional choice whether or not to find something compelling here.

David: It’s to do with the nature of first person facts. What is it that the equations of physics ultimately describe and if you think subjectivity or at least take it seriously the conjecture of that subjectivity is the essence of the physical, the fire in the equations, then yeah, it’s just objectively in the case that first person agony is disvaluable. Here we get into some very controversial issues. I would just like to go back to one thing Brian was saying about sophistication. I don’t think it’s plausible that let’s say a pilot whale is more cognitively sophisticated than humans but it’s very much an open question whether a pilot whale with a substantially larger brain, substantially larger neocortex, substantially larger pain and pleasure centers that the intensity of experience undergone by a pilot whale let’s say may be greater than that of humans. Therefore, other things being equal, I would say that it’s so profoundly aversive states undergone by the whale matter more than a human. It’s not the level of sophistication or complexity that counts.

Lucas: Do you want to unpack a little bit your view about the hedonics versus the preferences, and whether or not preferences have any weight in your view?

David: Only indirectly weight and that ultimately, yeah, as I said I think what matters is the pleasure-pain axis and preferences only matter in so far as they impact that. Thanks to natural selection, we have countless millions and billions of preferences that are being manufactured all the time as social primates countless preferences conflict with each other. There is simply no way to reconcile a lot of them. Whereas one can continue to enrich and enhance wellbeing so, yeah sure. Other things being equal satisfy people’s preferences. In so many contexts, it is logically impossible to do so from politics, the middle east, interpersonal relationships, the people’s desire to be the world famous this, that or the other. It is logically impossible to satisfy a vast number of preferences.

Lucas: I think it would be interesting and useful to dive into, within justification theories, like moral epistemology and ethical motivation. I think I want to turn to Brian now. Brian, I’m so curious to know if it’s possible given your view of anti-realism and suffering focused ethics, whether or not you can make moral progress or what it means to make moral progress. How does one navigate the realm of moral issues in your view, given the metaethics that you hold? Why ought I or others, or why not ought I or others to follow your ethics or not?

Brian: Moral progress I think can be thought of as many people have a desire to improve their own moral views using standards of improvement that they choose. For example, a common standard would be I think that the moral views that I will hold after learning more, I will generally now defer to those views as the better ones. There might be some exceptions especially if you get too much into some subject area that distorts your thinking relative to the way it was before. Basically, you can think of brain state changes as either being approved of or not approved of by the current state. Moral progress would consist of doing updates to your brain that you approve of, like installing updates to computer that you choose to install.

That’s what moral progress would be. Basically, you designated which changes do I want to happen and then, if those happen according to the rules then it’s on a progress relatively to what my current state thought. You can have failures of goal preservation. The example that Eliezer Yudkowsky gives is if you give Gandhi a pill that would make him want to kill people. He should not take it because that would change his goals in a way that his current goals don’t approve of. That would be moral anti-progress relative to Gandhi’s current goals. Yeah, that’s how I would think of it. Different people have different preferences about how much you can call preference idealization.

Preference idealization is the idea of imagining what preferences you would hold if you knew more, were smarter, had more experiences, and so on. Different people couldn’t want different amounts of preference idealization. There are some people who say I have almost no idea what I currently value and I want to defer that to an artificial intelligence to help me figure that out. In my case, it’s very clear to me that extreme suffering is what I want to continue to value and if I change from that stance, that would be a failure of goal preservation relative to my current values. There are still questions on which I do have significant uncertainty in a sense that I would defer to my future self.

For example, the question of how to weigh different brain complexities against each other is something where I still have significant uncertainty. The question of how much weight to give to what’s called higher order theory in consciousness versus first order theories basically how much you think that high level thoughts are an important component of what consciousness is. That’s an issue where I have significant moral uncertainty. There are issues where I want to learn more, think more about it, have more other people think about it before I make up my mind fully on what I think about that. Then, why should you hold my moral view? The real answer is because I want you to and I’ll try to come up with arguments to make it sound more convincing to you.

David: I find subjectivism troubling. I support my football team is Manchester United. I wouldn’t take a pill, less induced me to support Manchester City because that would subvert my values in some sense. Nonetheless, ultimately, support for Manchester United is arbitrary. It is a support for the reduction of suffering merely a kin to I once support lets say of Manchester United.

Brian: I think metaphysically, they’re the same. It feels very different. There’s more of a spiritual, like your whole being is behind reduction of suffering in the way that’s not true for football teams. Ultimately, there’s no metaphysical difference.

Intentional objects ultimately are arbitrary that natural selection has eschewed us a define certain intentional objects. This is philosophy jargon for the things we care about, whether it’s a football or politics, or anything. Nonetheless, it’s unlike these arbitrary intentional objects, it just seems to built into the nature of agony or despair that they are disvaluable. It’s simply not possible to instantiate such states and find it an open question whether they’re disvaluable or not.

Brian: I don’t know if we want to debate now but I think it is possible. I mean we already have examples of one organism who finds the suffering of another organism to be possibility valuable.

David: They are not mirror-touch synesthete. They do not accurately perceive what is going on and in so far as one does either as a mirror-touch synesthete or can do the equivalent of a Vulcan mind meld or something like that, one is not going to perceive the disvaluable as valuable. Its an epistemological limitation.

Brian: My objection to that is it depends how you hook up the wires between the two minds. Like if you hook up one person suffering to another person’s suffering, then the second person will say it’s also bad. If you hook up one person’s suffering neurons to another person’s pleasure neurons, then, the second person will say it’s good. It just depends how you hook up the wires.

David: It’s not all or nothing but if one is let’s say a mirror-touch synesthete today and someone’s, they stub their toe and you have an experience of pain, it’s simply not possible to take pleasure in their stubbing their toe. I think if one does have this notional god’s eye perspective, an impartial view from nowhere that one will act accordingly.

Brian: I disagree with that because I think you can always imagine just reversing the motivational wires so to speak. Just flip the wire that says this is bad. Flip it to saying this is good in terms of the agent’s motivation.

David: Right. Yes. I was trying to visualize what this would entail.

Brian: Even in a synesthete example, just imagine a brain where the same stimulus currently in normal humans, this stimulus triggers negative emotional responses just have the neurons hook up to the positive emotional responses instead.

David: Once again, wouldn’t this be an epistemological limitation rather than some deep metaphysical truth about the world?

Brian: Well, it depends how you define epistemology but you could be a psychopath where you correctly predict another organism’s behavior but you don’t care. You can have a difference between beliefs and motivations. The beliefs could correctly recognize this I think but the motivations could have the wires flipped such that there’s motivation to cause more of the suffering.

David: It’s just that I would say that the psychopath has an epistemological limitation in that the psychopath does not adequately take into account other perspectives. In that sense, psychopath lacks an adequate theory of mind. The psychopath is privileging one particular here and now over other here and nows, which is not metaphysically sustainable.

Brian: It might be a definitional dispute like whether you can consider having proper motivation to be part of epistemological accuracy or not. It seems that you’re saying if you’re not properly motivated to reduce … you don’t have proper epistemological access to it by definition.

David: Yes. One has to be extremely careful with using this term by definition. Yes. I would say that we are all to some degree sociopathic. One is quasi sociopathic to one’s future self for example and so far is one let’s say doesn’t prudently save but squanders money and stuff. We are far more psychopathic towards other sentient beings because one is failing to fully to take into account their perspective. It’s hardwired epistemological limitation. One thing I would very much agree with Brian on is moral uncertainty and being prepared to reflection and take into account other perspectives and allow for the possibility one can be wrong. It’s not always possible to have the luxury of moral reflection uncertainty.

If a kid is drowning, hopefully one that dashes into the water to save the kid. Is this the right thing to do? Well, what happens if the kid, this is the real story, happens to be a toddler grows up to the Adolf Hitler and plunges the world into war. One doesn’t know the long term consequences of one’s action. Wherever possible, yes, one urges reflection and caution in the context of a discussion or debate. One isn’t qualifying, one’s uncertainty, agnosticism carefully but in a more deliberative context perhaps of what one should certainly do so.

Lucas: Let’s just bring it a little bit back to the ethical epistemology behind and ethical motivation behind your hedonistic imperative given your objectivism. I guess here, it’d also be interesting to know if you could also explore key metaphysical uncertainties and physical uncertainties, and what more and how we might go about learning about the universe such that your view would be further informed.

David: Happy to launch into long spiel about my view. One thing I think it really is worth stressing is that one doesn’t need to buy into any form of utilitarianism or suffering-focused ethics to believe that we can and should phase out the biology of involuntary suffering. It’s common to all manner of secular and religious views that we should be other things being equal minimizing suffering reducing unnecessary suffering and this is one thing that technology, it could buy a technology allows us to do and support for something like universal access for implantation, genetic screening, phasing out factory farming and shutting slaughter houses, going on to essentially reprogram the biosphere.

It doesn’t involve a commitment to some particular one specific ethical or meta-ethical view. For something like pain-free surgery anesthesia, you don’t need to sign up for it to recognize it’s a good thing. I suppose my interest is very much in building bridges with other ethical traditions. Yeah, I am happy to go into some of my own personal views but I just don’t want to tie this idea that we can use bio-tech to get rid of suffering into anything quirky or idiosyncratic to me. I have a fair number of idiosyncratic views.

Lucas: It would be interesting if you’d explain whether or not you think that super intelligences or AGI will necessarily converge on what you view to be objective morality or if that is ultimately down to AI researchers to be very mindful of implementing.

David: I think there are real risk here when one starts speaking as though posthuman super intelligence is going to end up endorsing a version of one’s own views and values, which a priori ,if one thinks about, is extremely unlikely. I think too one needs to ask yeah, when I was talking about post human super intelligence, if post human super intelligence is biological descendants, I think post human super intelligence will have a recognizable descendant of pleasure-pain axis. I think it will be ratcheted up so that say experience below hedonic zero is impossible.

In that sense, I do see a convergence. By contrast, if one has a conception of post human super intelligence such that post human super intelligence may not be sentient, may not be experiential at all then, there is no guarantee that such a regime would be friendly to anything recognizably human in its values.

Lucas: The crux here there are different ways of doing value alignment and one such way is descriptively through a super intelligence being able to gain enough information about the set of all values that human beings have and say aligning to those or to some fraction of those or to some idealized version of those through something like a coherent extrapolated volition. Another one is where we embed a moral epistemology within the machine system, so that the machine becomes an ethical reasoner, almost a moral philosopher in its own right. It seems that given your objectivist ethics that with that moral epistemology, it would be able to converge on what is true. Do these different paths forward makes sense to you and/or it also seems that the role of mind melding seems to be very crucial and core to the realization of the correct ethics in your view?

David: With some people, their hearts sinks when the topic of machine consciousness crops up because they know it’s going to be a long inconclusive philosophical discussion and a shortage of any real empirical tests. Yeah, I will just state. I do not think a classical digital computer is capable of phenomenal binding, therefore it will not understand the nature of consciousness or pleasure and pain, and I see the emotion of value and disvalue is bound with the pleasure-pain axis. In that sense, I think what we’re calling machine artificial general intelligence, in one sense it’s invincibly ignorant. I know a lot of people would disagree with this description but if you think humans or at least some humans spend a lot of their time thinking about, talking about, exploring consciousness and it’s all varieties in some cases exploring psychedelia, what are we doing? There are vast range of cognitive domains that are completely, cognitively inaccessible to digital computers.

Lucas: Putting aside the issue of machine consciousness, it seems that being able to first-person access hedonic states provides a extremely foundational and core motivational or at least epistemological role in your ethics David.

David: Yes. I mean part of intelligence involves being able to distinguish the important from the trivial, which ultimately as far as I can see boils down to the pleasure-pain axis. Digital zombies have no conception of what is important or what is trivial I would say.

Lucas: Why would that be if a true zombie in the David Chalmers sense is functionally isomorphic to a human. Presumably that zombie would properly care about suffering because all of its functional behavior is the same. Do you think in the real world, digital computers can’t do the same functional computation that a human brain does?

David: None of us have the slightest idea how one would set about programming a computer to do the kinds of things that humans are doing when they talk about and discuss consciousness when they take psychedelics or discuss the nature of the self. I’m not saying work arounds are impossible. I just don’t think they’re spontaneously going to happen.

Brian: I agree. Just like building intelligence itself, it requires a lot of engineering to create those features of humanlike psychology.

Lucas: I don’t see why it would be physically or technically impossible to instantiate an emulation of that architecture or an architecture that’s basically identical to it in a machine system. I don’t understand why computer architecture, computer substrate is really so different from biological architecture or substrate such that it’s impossible for this case.

David: It’s whether one feels the force of the binding problem or not. The example one can give, imagine the population of the USA are skull bound minds, imagine them implementing any kind of computation you like. They are ultra fast, electromagnetic signaling far faster than the retro chemical signaling and the CNS is normally conceived. Nonetheless, short of a breakdown with monistic physicalism, there is simply no way that the population of the USA is spontaneously going to become subject to experience to apprehend perceptual objects. Essentially, all you have is a micro experiential zombie. The question is why are 86 billion odd membrane bound supposedly classical neurons any different?

Why aren’t we micro experiential zombies? One way to appreciate, i think, the force, the adaptive role of phenomenal binding is to look at syndromes where binding even harshly breaks down such as simultanagnosia where the subject can only see one thing at once. Or motion blindness or akinetopsia, where one can’t apprehend motion or severe forms of schizophrenia where there is no longer any unitary self. Somehow right now, you instantiate a unitary world simulation populated by multiple phenomenally bound dynamical objects and this is tremendously fitness enhancing.

The question is how can a bunch of membrane-bound nerve cells, a pack of neurons carry out what is classically impossible. I mean one can probe the CNS with temporary course grained and neuro scans… individual feature process, edge detectors, motion detectors, color detectors. Apparently, there are no perceptual objects there. How is it that right now that your mind/brain is capable of running this egocentric world simulation in almost real time. It’s astonishing computational feat. I argue for a version of quantum mind but one needn’t buy into this to recognize that it’s profound an unsolved problem. I mean why aren’t we like the population of the USA?

Lucas: Just to bring this back to the AI alignment problem and putting aside issues in phenomenal binding, and consciousness for a moment. Putting aside also the conception that super intelligence is likely to be some sort of biologic instantiation if we imagine the more AI safety mainstream approach, the MIRI idea of there being simply a machine super intelligence. It seems that in your view David and I think here this elucidates a lot of the interdependencies and difficulties where one’s meta-ethical views are intertwined in the end with what is true about consciousness and computation. It seems that in your view, close to or almost maybe perhaps impossible to actually do AI alignment or value alignment on machine super intelligence.

David: It is possible to do value alignment but I think the real worry is that if you take the MIRI scenario seriously, this recursively self-improving software that will somehow … This runaway intelligence. There’s no knowing where it may lead by MIRI as far as I know have very different conception of the nature of consciousness and value. I’m not aware that they tackle the binding problem. I just don’t see that unitary subjects of experience or values, or pleasure-pain axis are spontaneously going to emerge from software. It seems to involve some form of strong emergence.

Lucas: Right. I guess to tie this back and ground it a bit. It seems that the portion of your metaethics, which is going to be informed by empirical facts about consciousness and minds in general is the view in there that without access to the phenomenal pleasure-pain axis, what you view to have an intrinsic goodness or wrongness to it because it is foundationally and physically, and objectively the pleasure-pain axis of the universe, the heat and the spark in the equation I guess as you say. Without access to that, then ultimately, one will go awry in one’s ethics if one does not have access to phenomenal hedonic states given that that’s the core of value.

David: Yeah. In theory, an intelligent digital computer stroke robot could impartially pave the cosmos with either dolorium or hedonium without actually understanding the implications of what it was doing. Hedonium being or utilitronium, matter and energy optimized for pure bliss. Dolorium being matter and energy optimized for, lack of a better word, for pure misery or despair. That’s the system in question we do not understand the implications of what it was doing. That I know a lot of people do think that well, sooner or later, classical, digital computers, our machines are going to wake up. I don’t think it’s going to happen. Rather we’re not talking about hypothetical quantum computers next century and beyond. Simply an expansion of today’s programmable digital computers. I think they’re zombies and will remain zombies.

Lucas: Fully autonomous agents which are very free and super intelligent in relation to us will in your view require a fundamental access to that which is valuable, which is phenomenal states, which is the phenomenal pleasure-pain axis. Without that, it’s missing its key epistemological ingredient. It will fail in value alignment.

David: Yes, yeah, yeah. It just simply does not understand the nature of the world. It’s rather like claiming where the system is intelligent but doesn’t understand the second or of thermodynamics. It’s not a full spectrum super intelligence.

Lucas: I guess my open question there would be then, whether or not it would be possible to not have access to fundamental hedonic states but still be something of a Bodhisattva with a robust moral epistemology that was heading in the right direction or what might be objective.

David: The system in question would not understand the implications of what it was doing.

Lucas: Right. It wouldn’t understand the implications but if it got set off in that direction and it was simply achieving the goal, then I think in some cases we might call that value aligned.

David: Yes. One can imagine … Sorry Brian. Do intervene when you’re ready but yeah, one could imagine for example being skeptical of the possibility of interstellar travel for biological humans but programming systems to go out across the cosmos or at least within our cosmological horizons and convert matter and energy into pure bliss. I mean one needn’t assume that this will apply to our little bubble of civilization but watch if we do about inert matter and energy elsewhere in the galaxy. One can leave it as it is or if one is let’s say a classical utilitarian, one could convert it into pure bliss. Yeah, one can send out probes. One could restructure, reprogram matter and energy in that way.

That would be a kind of compromise solution in one sense. Keep complexity within our little tiny bubble of civilization but convert the rest of the accessible cosmos into pure bliss. Though that technically would not strictly speaking maximize the abundance of positive value in our hubble volume, nonetheless it could become extraordinarily close to it from a classical utilitarian perspective.

Lucas: Brian, do you have anything to add here?

Brian: While I disagree on many, many points, I think digital computation is capable of functionally similar enough processing as the brain does. Even that weren’t the case, a paperclip maximizer with a very different architecture would still have a very sophisticated model of human emotions and its motivations wouldn’t be hooked up to those emotions but it would understand for all other sense of the word understand human pleasure and pain. Yeah, I see it more as a challenge of hooking up the motivation properly. As far as my thoughts on alignment in general based on my metaethics, I tend to agree with the default approach like the MIRI approach, which is unsurprising because MIRI is also anti-realist on metaethics. That approach sees the task as taking human values and somehow translating them into the AI and so that could be in a  variety of different ways learning human values implicitly from certain examples or with some combination of maybe top down programming of certain ethical axioms.

That could send to exactly how you do alignment and there are lots of approaches to that. The basic idea that you need to specifically replicate the complexity of human values in machines and the complexity of the way humans reason. It won’t be there by default in any way shared between my opinion and that of the mainstream AI alignment approach.

Lucas: Do you take a view then similar to that of coherent extrapolated volition?

Brian: In case anybody doesn’t know, coherent extrapolated volition is Eliezer Yudkowsky’s idea of giving the AI the meta … You could call it a metaethics. It’s a meta rule for learning values to take humanity and think about what humanity want to want if it was smarter, knew, had more positive interactions with each other and thought faster and then, try to identify points of convergence among the values of different idealized humans. In terms of theoretical things to aim for, I think CEV is one reasonable target for reasons of cooperation among other humans. I mean if I controlled the world, I would prefer to have the AI implement my own values rather than humanities values because I care more about my values. Some human values are truly abhorrent to me and others seem at least unimportant to me.

In terms of getting everybody together to not fight endlessly over the outcome of AI in this theoretical scenario, CEV would be a reasonable target to strive for. In practice, I think that’s unrealistic like a pure CEV is unrealistic because the world does not listen to moral philosophers to any significant degree. In practice, things are determined by politics, economic power, technological and military power, and forces like that. Those determine most of what happens in the world. I think we may see approximations to CEV that are much more crude like you could say that democracy is an approximation to CEV in the sense that different people with different values, at least in theory, discuss their differences and then, come up with a compromise outcome.

Something like democracy maybe power-weighted democracy in which more powerful actors have more influence will be what ends up happening. The philosophers dream of idealizing values to perfection is unfortunately not going to happen. We can push in directions that are slightly more reflective. We can push aside towards slightly more reflection towards slightly more cooperation and things like that.

David: Couple of points that first, what to use an example we touched on before. What would be coherent extrapolated volition for all the world’s football supporters? Essentially, there’s simply no way to reconcile all their preferences. One may say that if they were fully informed football supporters, wouldn’t waste their time passionately supporting one team or another but essentially I’m not sure that the notion of coherent extrapolated volition there would make sense. Of course, there are more serious issues in football but the second thing when it comes to the nature of value, regardless of one’s metaphysical stance on whether one’s a realist or an anti-realist about value. I think it is possible by biotechnology to create states that are empirically, subjectively far more valuable than anything physiologically feasible today.

Take Prince Myshkin in Dostoevsky’s The Idiot. Like Dostoevsky was a temporal lobe epileptic and he said, “I would give my whole life for this one instant.” Essentially, there are states of consciousness that are empirically super valuable and rather than attempting to reconcile irreconcilable preferences, I think you could say that we should be and so far as we aspire to long term full spectrum super intelligence, perhaps we should be aiming to create these super valuable states. I’m not sure whether it’s really morally obligatory. I said my own focus is on the overriding importance of phasing out suffering but for someone who does give some weight or equal weight to positive experiences positively valuable experiences, that there is a vast range of valuable experience that is completely inaccessible to humans that could be engineered via biotechnology.

Lucas: A core difference here is going to be that given Brian’s view of anti-realism, AI alignment or value alignment would in the end be left to those powers which he described in order to resolve irreconcilable preferences. That is if human preferences don’t converge strongly enough after enough time and information that there are no longer irreconcilable preferences, which I guess I would suppose is probably wrong.

Brian: Which is wrong?

Lucas: That it would be wrong that human beings preferences would converge strongly enough that there would no longer be irreconcilable preferences after coherent extrapolated volition.

Brian: Okay, I agree.

Lucas: I’m saying that in the end, value alignment would be left up to economic forces, military forces, other forces to determine what comes out of value alignment. In David’s view, it would simply be down to if we could get the epistemology right and we could know enough about value and the pleasure-pain axis and the metaphysical status of phenomenal states that that would be value alignment would be to capitalize on that. I didn’t mean to interrupt you Brian. You want to jump in there?

Brian: I was going to say the same thing you did that I agree with David that there would be irreconcilable differences and in fact, many different parameters of the CEV algorithm would probably affect the outcome. One example that you could give is that people tend to crystallize their moral values as they age. You could imagine somebody who was presented with utilitarianism as a young person would be more inclined toward that whereas, maybe if that person haad been presented with deontology as a young person would the person would prefer  deontology as he got older and so depending on seemingly arbitrary factors such as the order in which you are presented with moral views or what else is going on in your life at the time that you confront a given moral view or 100 other inputs. The output could be sensitive to that. CEV is really a class of algorithms depending on how you tune the parameters. You could get substantially different outcomes.

Yeah, CEV is an improvement even if there’s no obvious unique target. As I said, in practice, we won’t even get pure CEV but we’ll get some kind of very rough power-weighted approximation similar to our present world of democracy and competition among various interest groups for control.

Lucas: Just to explain how I’m feeling so far. I mean Brian, I’m very sympathetic to your view but I’m also very sympathetic to David’s view. I hover somewhere in between. I like this point that David made where he quoted Russell, something along the lines that one ought to be careful when discussing ethical metaphysics such that one is not simply trying to make one’s own views and preferences objective.

David: Yeah. When one is talking about well, just in general, when one speaks about the nature for example post human super intelligence, think of the way today that the very nature and notion of intelligence is a contested term. Simply sticking the words super in front of it is just how illuminating is it. When I read someone’s account of super intelligence, I’m really reading an account of what kind of person they are, their intellect and their values. I’m sure when I discuss the nature of full spectrum super intelligence, at least now I can see what I can’t the extent to which I’m simply articulating my own limitations.

Lucas: I guess for me here to get all my partialities out of the way, I hope that objectivism is true because I think that it makes the value alignment way less messy. In the end, we could have something actually good and beautiful, which I don’t know is some preference that I have that might be objective or not just simply wrong, or confused. The descriptive picture that I think Brian is committed to, which gives rise to the MIRI and Tomasik form of anti-realism is just one where in the beginning, there was entropy and noise and many generations of stars fusing atoms into heavier elements. One day one of these disks turn into a planet and a sun shone some light on a planet, and the planet began to produce people. There’s an optimization process there in the end, which simply seems to be ultimately driven by entropy and morality seems to simply a part of this optimization process, which just works to facilitate and mediate the relations between angry mean primates like ourselves.

Brian: I would point out there’s also a lot of spandrel to morality in my opinion, specially these days not that we’re not heavily optimized by biological pressures. All these conversation that we’re having right now is a spandrel in the sense that it’s just an outgrowth of certain abilities that we evolve but it’s not at all adaptive in any direct sense.

Lucas: Right. In this view, it really just seems like morality and suffering, and all of this is just byproduct of the screaming entropy and noise of whatever led to this universe. At the same time, the objective process and I think this is the part the people who are committed to MIRI anti-realism and I guess just relativism and skepticism about ethics in general, maybe are not tapping into enough. At the same time, this objectivity is producing a very real and objective phenomenal self and story, which is caught up in suffering where suffering is really suffering and really sucks to suffer. It all seems at face value true in that moment throughout the suffering that this is real. The suffering is real. The suffering is bad. It’s pretty horrible.

This bliss is something that I would never give up or if the rest of the universe were this bliss, that would just be the most amazing thing ever. In this very subjective phenomenal, I like just experiential thing that the universe produces, the subjective phenomenal story and narrative that we live. It seems there’s just this huge tension between that and I think the anti-realism, the clear suffering of suffering and just being a human being.

Brian: I’m not sure if there’s a tension because the anti-realist agrees that humans experience suffering as meaningful and they experience it as the most important thing imaginable. There’s not really a tension and you can explore why humans quest for objectivity. There seems to be certain glow that attaches to things by saying that they’re objectivity moral. That’s just a weird quirk of human brains. I would say that ultimately, we can choose to care about what we care about whether it’s subjective or not. I often say even if objective truth exist, I don’t necessarily care what it says because I care about what I care about. It could turn out that objective truth orders you to torture squirrels. If it does, then, I’m not going to follow the objective truth. On reflection, I’m not unsatisfied at all with anti-realism because what more could you want than what you want.

Lucas: David, feel free to jump in if you’d like.

David: Well, there it’s just … there’s this temptation to oscillate between two senses of the words subjective. Subjective in neither truth nor false, and subject in the sense of first-person experience. My being in agony or you’re being in agony or someone being in despair is as I said as much an objective property of reality as the rest mass of the electron. I mean what we can be doing is working in such ways as to increase the theory to maximize the amount of subjective value in the world regardless of whether or not one believes that this has any transcendent significance with the proviso here that there is a risk that if one aims strictly speaking to maximize subjective value, that one gets the utilitronium shockwave. If one is as I said, what I personally advocate as aiming for a civilization of super intelligent bliss one is not asking people to give up their core values and preferences unless one of those core values and preferences is to keep hedonic set points unchanged. That’s not very intellectually satisfying but it’s … this idea if one is working towards some kind of census, compromise.

Lucas: I think now I want to get into a bit more just about ethical uncertainty and specifically with regards to meta-ethical uncertainty. I think that just given the kinds of people that we are, that even if we disagree about realism versus anti-realism or ascribe different probabilities to each view. We might pretty strongly converge on how we ought to do value alignment given our kinds of moral considerations that we have. I’m just curious to explore a little bit more about what you guys are most uncertain about what it would take to change your mind? What new information you would be looking for that might challenge or make you revise your metaethical view? How we might want to proceed with AI alignment given our metaethical uncertainty?

Brian: Can you do those one by one?

Lucas: Yeah, for sure. If I can remember everything I just said. First to start off, what do you guys most uncertain about within your meta-ethical theories?

Brian: I’m not very uncertain meta-ethically. I can’t actually think of what would convince me to change my metaethics because as I said, even if it turned out that metaphysically moral truth was a thing out there in some way whatever that would mean, I wouldn’t care about it except for like instrumental reasons. For example, if it was a god, then you’d have to instrumentally care about god punishing you or something but in terms of what I actually care about, it would be not connected to moral truth. Yeah, I would have to be some sort of revision of the way I conceive of my own values. I’m not sure what that would look like to be meta-ethically uncertain.

Lucas: There’s a branch of metaethics, which has to tackle this issue of meta-ethical commitment or moral commitment to meta-ethical views. If some sort of meta-ethical thing is true, why ought I to follow what is metaethically true? In your view Brian, it is just simply why ought you not to follow or why ought it not matter for you to follow what is meta-ethically true if there ends up being objective moral facts.

Brian: The squirrel example is a good illustration if ethics turned out to be, you must torture as many squirrels as possible. Then, screw moral truth. I don’t see what this abstract metaphysical thing has to do with what I care about myself. Basically, my ethics comes from empathy, seeing others in pain, wanting that to stop. Unless moral truths somehow gives insight about that, like maybe moral truths is somehow based on that kind of empathy, sophisticated way then, it would be another person giving me thoughts on morality. The metaphysical nature of it would be irrelevant. It would only be useful in so far as it would appeal to my own emotions and sense of what morality should be for me.

David: If I might interject. Undercutting my position and negative utilitarianism and suffering-focus ethics, I think it quietly likely that posthuman super intelligence, advance civilization with a hedonic range ratcheted right up to 70 to 100 or something like that. We’d look back on anyone articulating the kind of view that I am, that anyone who believes in suffering-focused ethics does and seeing it as some kind of depressive psychosis while intuitively assumes that our successes will be wiser than we are and perhaps, well they will be in many ways. Yet in another sense, I think we should be aspiring to ignorance that once we have done absolutely everything in our power to minimize mitigate, abolish and prevent suffering, I think we should forget it even existed. I hope that eventually any experience below hedonic zero will be literally inconceivable.

Lucas: Just to jump to you here David. What are your views about what you are most meta-ethically uncertain about?

David: It’s this worry that what one is doing however much one is pronouncing about the nature of reality, or the future of intelligence life in the universe and so on. What one is really doing is some kind of disguised autobiography. Given that quite a number of people sadly pain and suffering have loomed larger in my life than pleasure, turning this into deep metaphysical truth about the universe. This potentially undercuts my view. As I said, I think there are arguments against the symmetry view that suffering is self-intimatingly bad where there is nothing self-intimatingly bad about being  insentient system or a system that it’s really content. Nonetheless, yeah, I take seriously the possibility that’s all I’m doing is expressing obliquely by own limitations of perspective.

Lucas: Given these uncertainties and the difficulty and expected impact of AI alignment, if we’re again committing ourselves to this MIRI view of an intelligence explosion with quickly recursive self-improving AI systems, how would you both, if you were the king of AI strategy, how would you go about allocating your metaethics and how would you go about working on the AI alignment problem and thinking about the strategy given your uncertainties and your views?

Brian: I should mention that my most probable scenario for AI is a slow take off in which lots of components of intelligence emerge piece by piece rather than a localized intelligence explosion. As far as the intelligence like if it were a hard take off localized intelligence explosion, then, yeah I think the diversity approaches that people are considering is what I would do as well. It seems to me, you have to somehow learn values because in the same way that we’ve discovered that teaching machines by learning is more powerful than teaching them by hard coding rules. You probably have to mostly learn values as well. Although, there might be hard coding mixed in. Yeah, I would just pursue a variety of approaches and the way that the current community is doing.

I support the fact that there is also a diversity of short term versus long term focus. Some people are working on concrete problems. Others are focusing on issues like decision theory and logical uncertainty and so on because I think some of those foundational issues will be very important. For example, decision theory could make a huge difference to the AI’s effectiveness as well as issues of what happens in conflict situations. Yeah, I think a diversity of approaches is valuable. I don’t have a specific advice on when I would recommend tweaking current approaches. I guess I expected that the concrete problems work will mostly be done automatically by industry because those are the kinds of problems that you need to make AI work at all. If anything, I might invest more in the kind of long-term approaches that practical applications are likely to ignore or at least put off until later.

David: Yes, because of my background assumptions are different, it’s hard for me to deal with your question. If one believes that subjects of experience that could suffer could simply emerge at different levels of abstraction, I don’t really know how to tackle this because this strikes me as a form of strong emergence. One of the reasons why philosophers don’t like strong emergence is that essentially, all bets are off. Yeah, you imagine if life hadn’t been reducible to molecular biology and hence, ultimately to content chemistry and physics. Yeah, I’m not probably the best person to answer your question.

I think in terms of real moral focus, I would like to see essentially the molecular signature of unpleasant experience identified and essentially, you’re just making it completely off limits and biologically impossible for any sentient being to suffer. If one also believes that there are or could be subjects of experience that somehow emerge in classical digital computers, then, yeah, I’m floundering my theory of mind and reality would be wrong.

Lucas: I think touching on the paper that Kaj Sotala had written on suffering risks, I think that a lot of different value systems would also converge with you on your view David. Whether or not we take the view of realism or anti-realism, I think that most people would agree with you. I think the issue comes about with again, preference conflicts where some people I think even this might be a widespread view in catholicism where you view suffering as really important because it teaches you things and/or it has some special metaphysical significance with relation to god. Within the anti-realism view, with Brian’s view, I would find it very… just dealing with varying preferences on whether or not we should be able to suffer is something I just don’t want to deal with.

Brian: Yeah, that illustrates what I was saying about I prefer my values over the collective values of humanity. That’s one example.

David: I don’t think it would be disputed that sometimes suffering can teach lessons. The question is are there any lessons that couldn’t be functionally replaced by something else. This idea that we can just offload the nasty side of life on to software. In the case of pain, nociception one knows that yeah, so they brought software systems can be program or trained up to avoid noxious stimuli without the nasty raw feels should we be doing the same thing for organic biological robots too. When it comes to this, the question of suffering, one can have quite fierce and lively disputes with someone who says that yeah, they want to retain the capacity to suffer. This is very different from involuntary suffering. I think that quite often someone can see that no, they wouldn’t want to force another sentient being to suffer against their will. It should be a matter of choice.

Lucas: To tie this all into AI alignment again, really the point of this conversation is that again, we’re doing ethics on a deadline. If you survey the top 100 AI safety researchers or AI researches in the world, you’ll see that they give a probability distribution of the likelihood of human level artificial intelligence with about a 50% probability at 2050. This, many suspect, will have enormous implications for earth originating-intelligent life and our cosmic endowment. Our normative and descriptive and applied ethical practices that we engage with are all embodiments and consequential to the sorts of meta-ethical views, which we hold, which may not even be explicit. I think many people don’t really think about metaethics very much. I think that many AI researchers probably don’t think about metaethics very much.

The end towards which AI will be aimed will largely be a consequence of some aggregate of meta-ethical views and assumptions or the meta-ethical views and assumptions of a select few. I guess Brian and David, just to tie this all together, what do you guys view as really the practicality of metaethics in general and in terms of technology and AI alignment.

Brian: As far as what you said about metaethics determining the outcome, I would say maybe the implicit metaethics will determine the outcome but I think as we discuss before, 90 some percent of the outcome will be determined by ordinary economic and political forces. Most people in politics in general don’t think about metaethics explicitly but they still engage in the process and have a big impact on the outcome. I think the same will be true in AI alignment. People will push for things they want to push for and that’ll mostly determine what happens. It’s possible that metaethics could inspire people to be more cooperative depending on how it’s framed. CEV as a practical metaethics could potentially inspire cooperation if it’s seen as an ideal to work towards, although the extent to which it can actually be achieve is questionable.

Sometimes, you might have a naïve view where a moral realist assumes that a super intelligent AI would necessarily converge to the moral truth or at least a super intelligent AI could identify the moral truth and then, maybe all you need to do is program the AI to care about the moral truth once it discovers it. Those particular naïve approaches, I think would produce the wrong outcomes because there would be no moral truth to be found. I think it’s important to be wary of that assumption that a super intelligence will figure it out on its own and we don’t need to do the hard work of loading complex human values ourselves. It seems like the current AI alignment community largely recognizes that they recognize that there’s a lot of hard work in loading values and it won’t just happen automatically.

David: In terms of metaethics, consider the nature of pain-free surgery, surgical anesthesia. When it was first introduced in the mid 19th century, it was for about 15 years controversial. There were powerful voices who spoke against it but nonetheless, very rapidly a consensus emerge and we all now, almost all take it for granted for major surgery anesthesia. It didn’t require a consensus on the nature of value and metaethics. It’s just this is the obvious given our nature. Clearly, I would hope that eventually something similar will happen not just for physical pain but also psychological pain too. Just as we now take it for granted that it was the right thing to do to eradicate smallpox, no one is seriously suggesting that we bring smallpox back and it doesn’t depend on consensus on metaethics.

I would hope that the experience below hedonic zero, which one can possibly we’ll be able to find its precise molecular signature. I hope that consensus will emerge that we should phase it out too. Sorry, this isn’t much in the way of practical guidance to today’s roboticist and AI researchers but I suppose I’m just expressing my hope here.

Lucas: No, I think I share that. I think that we have to do ethics on a deadline but I think that there are certain ethical things whose deadline is much longer or which doesn’t necessarily have a real concrete deadline. I like… with your example of the pain anesthesia drugs.

Brian: In my view, metaethics is mostly useful for people like us or other philosophers and effective altruists who can inform our own advocacy. We want to figure out what we care about and then, we go for it and push for that. Then, maybe to some extent, it may diffuse through society in certain ways but in the start, it’s just helping us figure out what we want to push for.

Lucas: There’s an extent to which the evolution of human civilization has also been an evolution of metaethical views, which are consciously or unconsciously being developed. Brian, your view is simply that 90% of what has causal efficacy over what happens in the end are going to be like military and economic and just like raw optimization forces that work on this planet.

Brian: Also, politics and memetic spandrels. For example, like people talk about the rise of postmodernism as replacement of metaethical realism with anti-realism in popular culture. I think that is a real development. One can question to what extent, it matters. Maybe it’s correlated with things like a decline in religiosity which matters more. I think that is one good example of how metaethics can actually go popular and mainstream.

Lucas: Right. Just to bring this back, I think that in terms of the AI alignment problem, I think I try to or at least I’d like to be a bit more optimistic about how much causal efficacy each part of thinking has causal efficacy over the AI alignment problem. I like to not or I tend not to think that 90% of it will in the end be due to rogue impersonal forces like you’re discussing. I think that everyone no matter who you are stands to gain from more metaethical thinking in so far as that whether you take realist or anti-realist views. The expression of your values or whatever you think your values might be whether they might be conventional or relative, or arbitrary in your view, or whether they might relate to some objectivity. They’re much likely less to be expressed and I think a reasonable in a good way, without sufficient metaethical thinking and discussion.

David: One thing I would very much hope that before for example, radiating out across the cosmos, we would sort out our problems on earth in the solar system first regardless of whether one is secular or religious, or a classical or a negative utilitarian, let’s not start thinking about colonizing nearby solar systems or anything there. Yeah, if one is an optimist or maybe thinking of opportunities forgone but at least wait a few centuries. I think in a fundamental sense, we do not understand the nature of reality and not understanding the nature of reality comes with not understanding the nature of value and disvalue or the experience of value and disvalue as Brian might put it.

Brian: Unfortunately, I’m more pessimistic than David. I think the forces of expansion will be hard to stop as they always have been historically. Nuclear weapons are something that almost everybody wishes hadn’t been developed and yet they were developed. Climate change is something that people would like to stop but it has a force of its own due to the difficulty of coordination. I think the same will be true for space colonization and AI development as well that we can make tweaks around the edges but the large trajectory will be determined by the runaway economic and technological situation that we find ourselves in.

David: I fear Brian maybe right. I used to sometimes think about the possibilities of so-called cosmic rescue missions if the rare earth hypothesis is false and suffering Darwinian life exists within our cosmological horizon. I used to imagine this idea that we would radiate out and prevent suffering elsewhere. A, I suspect the rare earth hypothesis is true but B, I suspect even if for suffering life forms do exist elsewhere within our hubble volume. It’s probably more likely humans or our successes would go out and just create more suffering or it’s a rather dark and pessimistic view in my more optimistic moments I think we will phase out suffering all together in the next few centuries but these are guesses really.

Lucas: We’re dealing with ultimately given AI and it being the most powerful optimization process or the seed optimization process to radiate out from earth. I mean we’re dealing with potential astronomical waste or astronomical value, or astronomical disvalue and if we tie this again into moral uncertainty and start thinking about William MacAskill’s work on moral uncertainty where we just do what might be like expected value calculations with regards to our moral uncertainty. We’ve tried to be very mathematical about it and consider the amount of matter and energy that we are dealing with here. Given a super intelligent optimization process coming from Earth.

I think that tying this all together and considering it all should potentially plan an important role in our AI strategy. I definitely feel very sympathetic to Brian’s views that in the end, it might all simply come down to these impersonal economic and political, and militaristic, and memetic forces which exist. Given moral uncertainty, given meta-ethical uncertainty and given the amount of matter and energy that is at stake, potentially some portion of AI strategy should play into circumventing those forces or trying to get around them or decrease them and their effects and hold on AI alignment.

Brian: Yeah. I think it’s tweaks around the edges as I said unless these approaches become very mainstream but I think the prior probability that AI alignment of the type that you would hope for becomes worldwide is low because the prior probability that any given thing becomes worldwide mainstream is low. You can certainly influence local communities who share those ideals and they can try to influence things to the extent possible.

Lucas: Right. I mean maybe something potentially more sinister is that it doesn’t need to become worldwide if there’s a singleton scenario or if the power and control over the AI is very small within a tiny organization or some smaller organization which has power in autonomy to do this kind of thing.

Brian: Yeah, I guess I would again say the probability that you will influence those people would be low. Personally, I would imagine it would be either within a government or a large corporation. Maybe we have disproportionate impact on AI developers relative to the average human. Especially as AI becomes more powerful, I would expect more and more actors to try to have an influence. Our proportional influence would decline.

Lucas: Well, I’m feel very pessimistic after all this. Morality is not real and everything’s probably going to shit because economics and politics is going to drive it all in the end, huh?

David: It’s also possible that we’re heading for a glorious future of super human bliss beyond the bounds of every day experience and that this is just the fag end of Darwinian life.

Lucas: All right. David, we’ll be having I think as you say one day we might have thoughts as beautiful as sunsets.

David: What a beautiful note to end on.

Lucas: I hope that one day we have thoughts as beautiful as sunsets and that suffering is a thing of the past whether that be objective or subjective within the context of an empty cold universe of just entropy. Great. Well, thank you so much Brian and David. Do you guys have any more questions or anything you’d like to say or any plugs, last minute things?

Brian: Yeah, I’m interested in promoting research on how you should tweak AI trajectories if you are foremost concerned about suffering. A lot of this work is being done by the Foundational Research Institute, which aims to avert s-risks especially as they are related to AI. I would encourage people interested in futurism to think about suffering scenarios in addition to extinction scenarios. Also, people who are interested in suffering-focused ethics to become more interested in futurism and thinking about how they can affect long-term trajectories.

David: Visit my websites urging the use of biotechnology to phase out suffering in favor of gradients of intelligent bliss for all sentient beings. I’d also like just to say yeah, thank you Lucas for this podcast and all the work that you’re doing.

Brian: Yeah, thanks for having us on.

Lucas: Yeah, thank you. Two Bodhisattvas if I’ve ever met them.

David: If only.

Lucas: Thanks so much guys.

If you enjoyed this podcast, please subscribe. Give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

Podcast: Six Experts Explain the Killer Robots Debate

Why are so many AI researchers so worried about lethal autonomous weapons? What makes autonomous weapons so much worse than any other weapons we have today? And why is it so hard for countries to come to a consensus about autonomous weapons? Not surprisingly, the short answer is: it’s complicated.

In this month’s podcast, Ariel spoke with experts from a variety of perspectives on the current status of lethal autonomous weapons systems (LAWS), where we are headed, and the feasibility of banning these weapons. Guests include ex-Pentagon advisor Paul Scharre (3:40), artificial intelligence professor Toby Walsh (40:51), Article 36 founder Richard Moyes (53:30), Campaign to Stop Killer Robots founder Mary Wareham and Bonnie Docherty of Human Rights Watch (1:03:38), and ethicist and co-founder of the International Committee for Robot Arms Control, Peter Asaro (1:32:39).

Topics discussed in this episode include:

  • the history of semi-autonomous weaponry in World War II and the Cold War (including the Tomahawk Anti-Ship Missile)
  • how major military powers like China, Russia, and the US are imbuing AI in weapons today
  • why it’s so difficult to define LAWS and draw a line in the sand
  • the relationship between LAWS proliferation and war crimes
  • FLI’s recent pledge, where over 200 organizations and over 2800 individuals pledged not to assist in developing or using LAWS
  • comparing LAWS to blinding lasers and chemical weapons
  • why there is hope for the UN to address this issue

Publications discussed in this episode include:

You can listen to the podcast above, and read the full transcript below. You can check out previous podcasts on SoundCloud, iTunes, GooglePlay, and Stitcher.

If you work with artificial intelligence in any way, and if you believe that the final decision to take a life should remain a human responsibility rather than falling to a machine, then please consider signing this pledge, either as an individual or on behalf of your organization.

Ariel: Hello. I’m Ariel Conn with the Future of Life Institute. As you may have seen, this month we announced a pledge against lethal autonomous weapons. The pledge calls upon governments and government leaders to create a future with strong international norms, regulations and laws against lethal autonomous weapons. But in the meantime signatories agree that they they will neither participate in nor support the development, manufacture, trade, or use of lethal autonomous weapons. At the time of this recording, over 220 AI-related organizations and over 2800 individuals have signed. Signatories include Google DeepMind and its founders, University College London, the XPRIZE Foundation, Clearpath Robotics, Silicon Valley Robotics, the European Association for Artificial Intelligence — and many other AI societies and organizations from around the world. Additionally, people who signed include Elon Musk, Google’s head of research and machine learning Jeff Dean, many other prominent AI researchers, such as Stuart Russell, Toby Walsh, Meredith Whitaker, Anca Dragan, Yoshua Bengio, and even politicians, like British MP Alex Sobel.

But why? We’ve all seen the movies and read the books about AI gone wrong, and yet most of the signatories agree that the last thing they’re worried about is malicious AI. No one thinks the Terminator is in our future. So why are so many people in the world of AI so worried about lethal autonomous weapons? What makes autonomous weapons so much worse than any other weapons we have today? And why is it so hard for countries to come to a consensus about autonomous weapons? Not surprisingly, the short answer is: it’s complicated. For the longer answer, we have this podcast.

For this podcast, I spoke with six of the leading experts in autonomous weapons. You’ll hear from defense expert Paul Scharre, who recently released the book Army of None: Autonomous Weapons and the Future of War. We discuss the history of autonomous and semi-autonomous weaponry, which dates back to WWII, as well as some of the more nuanced issues today that often come up for debate. AI researcher Toby Walsh looks at lethal autonomous weapons from a more technical perspective, considering the impact of autonomous weapons on society, and also the negative effects they could have for AI researchers if AI technology is used to kill people. Richard Moyes, with Article 36, coined the phrase meaningful human control, which is what much of the lethal autonomous weapons debate at the United Nations now focuses on. He describes what that means and why it’s important. Mary Wareham and Bonnie Docherty joined from Human Rights Watch, and they’re also with the Campaign to Stop Killer Robots. They talk about the humanitarian impact of lethal autonomous weapons and they explain the process going on at the United Nations today as efforts move toward a ban. Finally, my interviews end with Peter Asaro with the International Committee for Robot Arms Control and also the Campaign to Stop Killer Robots. Peter considers the issue of lethal autonomous weapons from an ethical and legal standpoint, looking at the impact killer robots could have on everything from human dignity to war crimes.

But I’ll let each of them introduce themselves better when their interviews begin. And because this podcast is so long, in the description, we’ve included the times that each interview starts, so that you can more easily jump around or listen to sections as you have time.

One quick, final point to mention is that everyone was kind enough to join at the last minute, which means not all of the audio is perfect. Most of it is fine, but please bear with us if you can hear people chattering in the background or any other similar imperfections.

And now for the first interview with Paul Scharre.

Paul: I’m Paul Scharre. I’m a senior fellow and director of the Technology and National Security Program at the Center for a New American Security. We’re a Washington, D.C.-based national security think tank that’s an independent bipartisan research organization.

Ariel: You have a background in weaponry. You were in the military, correct?

Paul: Yeah. I served about five and a half years in the US Army as a Ranger and a civil affairs team leader. I did multiple tours to Iraq and Afghanistan, and then I worked for several years after that in the Pentagon in the Office of the Secretary of Defense, where I actually worked on policy issues for emerging weapons technologies, including autonomous weapons.

Ariel: Okay. One of the very first questions that I want to start with is, how do you define an autonomous weapon?

Paul: That’s sort of the million-dollar question in a lot of ways. I don’t want to imply that all of the debate around autonomous weapons is a misunderstanding of semantics. That’s not true at all. There are clearly people who have very different views on what to do about the technology, but it is a big complicating factor because I have certainly seen, especially at the United Nations, very heated disagreements where it’s clear that people are just talking past each other in terms of what they’re envisioning.

When you say the term “autonomous weapon,” it conjures all sorts of different ideas in people’s minds, some people envisioning super advanced intelligent machines that have human-like or superhuman intelligence, something like a Terminator or Cylon from science fiction. The other people are envisioning something that might be very simple and doable today, like a Roomba with a gun on it.

Both of those things are probably really bad ideas but for very different kinds of reasons. And I think that that’s a complicating factor. So one of the dimensions of autonomy that people tend to get fixated on is how smart the weapon system is. I actually don’t think that that’s a useful way to define an autonomous weapon. Sometimes I’ll hear people say things like, “Well, this is not an autonomous weapon. This is an automated weapon because of the level of sophistication.” I don’t think that’s very helpful.

I think it’s much better, actually, to focus on the functions that the weapon is performing on its own. This is similar to the approach that the International Committee of the Red Cross has, which focuses on critical functions in weapons systems. The way that I define it in my book is I basically define an autonomous weapon as one that can complete an entire engagement cycle on its own. That is to say, it has all of the functionality needed to search for targets, to identify them, to make a decision about whether or not to attack them, and then to start the engagement and carry through the engagement all by itself.

So there’s no human in this loop, this cognitive loop, of sensing and deciding and acting out on the battlefield all by itself. That defines it in such a way that there are some things — and this is where it gets into some of the tricky definitional issues — there are weapons that have been around since World War II that I would call semi-autonomous weapons that have some degree of autonomy, that have some sensors on board. They can detect the enemy, and they can make some rudimentary kinds of actions, like maneuvering towards the enemy.

Militaries generally call these “homing munitions.” They’re torpedoes or air-to-air missiles or surface-to-air, air-to-ground missiles. They have sensors on them that might use sonar or radar or acoustic signatures. They can sense that the enemy is there, and then they use those sensors to maneuver towards the enemy to strike the target. These are generally launched by people at targets where the human knows there’s a target there.

These were originally invented in World War II by the Germans to hit Allied ships in the submarine wars in the Atlantic in World War II. You can imagine there’s a technical challenge trying to hit a moving target of a ship that’s moving. In a submarine, you’re trying to fire a torpedo at it and you might miss. So the first versions of these had microphones that could listen to the sound of the propellers from Allied ships and then steer towards where the sound was greatest so they could hit the ship.

In those cases — and this is still the case in the ones that are used today — humans see the target or have some indication of the target, maybe from a radar or sonar signature. And humans say, “There’s something out there. I want to launch this weapon to go attack it.” Those have been around for 70 years or so. I bring them up because there are some people who sometimes say, “Well, look. These autonomous weapons already exist. This is all a bunch of hullaballoo about nothing.”

I don’t think that’s really true. I think that a lot of the weapons systems that you see concern about going forward, would be things that will be quite qualitatively different, things that are going out over a wide area and searching for targets on their own, where humans don’t necessarily know where the enemy is. They might have some suspicion that the enemy might be in this area at this point in time, but they don’t know, and they launch the weapon to then find the enemy. And then, without radioing back to a human for approval, that weapon is delegated the authority to attack on its own.

By and large, we don’t see weapons like this in existence today. There are some exceptions. The Israeli Harpy drone or loitering munition is an exception. There were a couple experimental US systems in the ’80s and ’90s that are no longer in service. But this isn’t something that is in widespread use. So I do think that the debate about where we’re going in the future is at least a very valid one, and we are on the cusp of, potentially, things that will be quite different than anything we’ve seen before in warfare.

Ariel: I want to ask a quick question about the Harpy and any other type of weapon similar to that. Have those actually been used to kill anyone yet, to actually identify a target and kill some enemy? Or are they still just being used for identifying and potentially targeting people, but it’s still a human who is making the final decision?

Paul: That’s a great question. To the best of my knowledge, the Israeli Harpy has not been used in its fully autonomous mode in combat. So a couple things about how the Harpy functions. First of all, it doesn’t target people per se; it targets radars. Now, having said that, if a person is standing next to a radar that it targets, you’re probably going to be killed. But it’s not looking for individual persons. It’s looking for radar signatures and then zeroing in on them.

I mention that as important for two reasons. One, sometimes in some of the concerns that people raise about autonomous weapons, it can sometimes be unclear, at least to a listener, whether they are concerned about specifically weapons that would target humans or any weapon that might target anything on the battlefield. So that’s one consideration.

But, also, from sort of a practicality standpoint, it is easier to identify radar signatures more accurately than people who, of course, in many modern conflicts are not wearing uniforms or insignia or the things that might clearly identify them as a combatant. So a lot of the issues around distinction and accurately discriminating between combatants and noncombatants are harder for weapons that would target people.

But the answer to the question is a little bit tricky because there was an incident a couple years ago where a second-generation version of the Harpy called the Harop, or Harpy II, was used in the Nagorno-Karabakh region in the conflict there between Azerbaijan and Armenia. I think it was used by Azerbaijan and used to attack what looked like — I believe it was a bus full of fighters.

Now, by all accounts, the incident was one of actual militants being targeted — combatants — not civilians. But here was a case where it was clearly not a radar. It was a bus that would not have been emitting radar signatures. Based on my understanding of how the technology works, the Harop, the Harpy II, has a human-in-the-loop mode. The first-generation Harpy, as far as I understand, is all autonomous. The second-generation version definitely has a human-in-the-loop mode. It looks like it’s not clear whether it also has an autonomous version.

In writing the book, I reached out to the manufacturer for more details on this, and they were not particularly forthcoming. But in that instance, it looks like it was probably directed by a human, that attack, because as far as we know, the weapon does not have the ability to autonomously target something like a bus.

Ariel: Okay.

Paul: That’s a really long-winded answer. This is what actually makes this issue super hard sometimes because they depend a lot on the technical specifications of the weapon, which a) are complicated and b) are not always very transparent. Companies are not always very transparent publicly about how their weapons systems function.

One can understand why that is. They don’t want adversaries to come up with methods of fooling them and countermeasures. On the other hand, for people who are interested in understanding how companies are pushing the bounds of autonomy, that can be very frustrating.

Ariel: One of the things that I really like about the way you think is that it is very nuanced and takes into account a lot of these different issues. I think it’s tempting and easy and, I don’t want to make it sound like I’m being lazy, because I personally support banning lethal autonomous weapons. But I think it’s a really complicated issue, and so I’d like to know more about What are your thoughts on a ban?

Paul: There are two areas on this topic that I think is where it gets really complicated and really tricky. If you start with a broad principle that someone might have of something like, “Humans should be making decisions about lethal force,” or, “Only humans should be deciding to take human life.” There’s two areas where you try to … How do I put them into practice? And then you really run into some serious challenges.

And I’m not saying that makes it impossible because difficult answers you have to really sort of roll up your sleeves and get into some of the details of the issue. One is, how do you translate a broad concept like that into technical specifications of a weapon? If you start with an idea and say, “Well, only humans should be responsible for taking human life,” that seems like a reasonable idea.

How do you translate that into technical guidance that you give weapons developers over what they can and cannot build? That’s actually really hard, and I say that as having done this when I worked at the Pentagon and we tried to write guidance that was really designed to be internal to the US Defense Department and to give guidance to defense companies and to military researchers on what they could build.

It was hard to translate some of these abstract concepts like, “Humans should decide the targets,” to technical ideas. Well, what does that mean for how long the weapon can loiter over a target area or how big its sensor field should be or how long it can search for? You have to try to figure out how to put those technical characteristics into practice.

Let me give you two examples of a weapon to illustrate how this can be challenging. You might imagine a weapon today where a human says, “Ah, here’s an enemy target. I want to take that target out.” They launch a missile, and the missile flies towards the target. Let’s say it’s a tank. The missile uses a millimeter-wave seeker on the tank. It’s an active seeker, sends out millimeter-wave radar signatures to see the tank and illustrate it and sort of highlight it from the background and then zero in on the tank, because the tank’s moving and they need to have the sensor to hit the moving tank.

If the weapon and the sensor can only search for a very limited space in time and geography, then you’ve constrained the autonomy enough that the human is still in control of what it’s targeting. But as you start to open that aperture up, and maybe it’s no longer that it’s searching for one minute in a one-kilometer area, it’s now searching for eight hours over 1,000 kilometers, now you have a completely different kind of weapon system. Now it’s one that’s much more like … I make the analogy in the book of the difference between a police dog that might be set loose to go chase down a suspect, where the human says, “There’s the suspect. Dog, go get them,” versus a mad dog roaming the streets attacking anyone at will.

You have two different paradigms, but where do you draw the line in between? And where do you say, “Well, is 1 minute of loiter time, is it 2 minutes, is it 10 minutes, is it 20 minutes? What’s the geographic area?” It’s going to depend a lot on the target, the environment, what kind of clutter is in the environment. What might be an appropriate answer for tanks in an urban combat setting might be very different than naval ships on the high seas or submarines underwater or some other target in a different environment.

So that’s one challenge, and then the other challenge, of course, which is even more contested, is just sort of, “What’s the feasibility of a ban and getting countries to come together to actually agree to things?” because, ultimately, countries have militaries because they don’t trust each other. They don’t trust international law to constrain other countries from aggressive action. So regardless of whether you favor one country or another, you consider yourself an American or a Russian or a Chinese or a French or Israeli or Guinean or someone else, countries in general, they have militaries because they don’t trust others.

That makes … Even if you get countries to sign up to a ban, that’s a major challenge in getting people to actually adhere to, then, because countries are always fearful about others breaking these rules and cheating and getting the upper hand.

Ariel: We have had other bans. We’ve banned biological weapons, chemical weapons, landmines, space weapons. Do you see this as different somehow?

Paul: Yeah. So one of the things I go through in my book is, as comprehensive as I can come up with, a list of all of the attempts to regulate and control emerging technologies dating back to antiquity, dating back to ancient Indian prohibitions and Hindu Laws of Manu or the Mahabharata on poisoned and barbed arrows and fire-tip weapons.

It’s really a mixed bag. I like to say that there’s sort of enough examples of both successes and failures for people to pick whichever examples they want for whatever side they’re arguing for because there are many examples of successful bans. And I would say they’re largely successful. There are some examples of isolated incidences of people not adhering to them. Very few bans are universally adhered to. We certainly have Bashar al-Assad using chemical weapons in Syria today.

But bans that have been largely successful and that they’ve at least had a major effect in reducing these weapons include landmines, cluster munitions, blinding lasers, biological weapons, chemical weapons, using the environment as a weapon, placing nuclear weapons on the seabed or in orbit, placing any weapons of any kind on the moon or Antarctica, various regulations during the Cold War, anti-ballistic missile systems, intermediate-range nuclear ground-launch missiles, and then, of course, regulations on a number of nuclear weapons.

So there are a lot of successful examples. Now, on the other side of the coin, there are failed attempts to ban, famously, the crossbow, and that’s often brought up in these conversations. But in more recent memory, attempts of the 20th century to ban and regulate aircraft and air-delivered weapons, submarine warfare, of course the failure of attempts to ban poison gas in World War I. So there are examples on other sides of the ledger as well.

One of the things that I try to do in my book is get beyond sort of just picking examples that people like, and say, “Well, is there a pattern here? Are there some common conditions that make certain bans more likely to succeed or fail?” There’s been great scholarship done by some others before me that I was able to build on. Rebecca Crootof and Sean Welsh have done work on this trying to identify some common patterns.

I think that that’s a … If you want to look at this analytically, that’s a fruitful place to start, is to say, “Why do some bans succeed and some fail?” And then, when you’re looking at any new technology, whether it’s autonomous weapons or something else, where do they fall on this spectrum, and what does that suggest about the feasibility of certain attempts at regulation versus others?

Ariel: Can you expand on that a little bit? What have you found, or what have they found in terms of patterns for success versus failure for a ban?

Paul: I think there’s a couple criteria that seem to matter. One is the clarity of a ban is really crucial. Everyone needs to have a clear agreement on what is in and what is out. The simpler and clearer the definition is, the better. In some cases, this principle is actually baked into the way that certain treaties are written. I think the ban on cluster munitions is a great example of this, where the Cluster Munition Convention has a very, very simple principle in the treaty. It says, “Cluster munitions are banned,” full stop.

Now, if you go into the definition, now there’s all sorts of nuance about what constitutes a cluster munition or not. That’s where they get into some of the horse trading with countries ahead of time. But sort of the principle is no cluster munitions. The archetype of this importance of clarity comes in the success of restraint among European powers in using chemical weapons against each other in World War II. All sides had them. They didn’t use them on the battlefield against each other. Of course, Germany used them in the Holocaust and there were some other isolated incidences in World War II of use against others who didn’t have them.

But the European powers all had tens of thousands of tons of mustard gas stockpiled, and they didn’t use it against each other. At the outset of World War II, there were also attempts to restrain aerial bombing of cities. It was widely viewed as reprehensible. It was also illegal under international law at the time, and there were attempts on all sides to refrain from that. At the outset of the war, in fact, they did, and Hitler actually put a directive to the Luftwaffe. I talk about this a little bit in the book, although unfortunately, a lot of the detail on some of this stuff got cut for space, which I was disappointed by.

Hitler put a directive to the Luftwaffe saying that they were not to engage in bombing of civilian targets, a terror bombing, in Britain, they were only to engage in bombing military targets, not because he was a humanitarian, because he was concerned about Britain retaliating. This attempt at restraint failed when, in the middle of the night, a German bomber strayed off course and bombed central London by mistake. In retaliation, Churchill ordered the bombing of Berlin. Hitler was incensed, gave a speech the following day announcing the launch of the London Blitz.

So here’s an example where there was some slippage in the principle of what was allowed and what was not, and so you had a little bit of accidental crossing of the line in conflict. So the sharper and clearer this line is, the better. You could extrapolate from that and say it’s likely that if, for example, what World War II powers had agreed to in World War II was that they could only use poison gas against military targets but not against civilian targets, that it would have quickly escalated to civilian targets as well.

In the context of autonomous weapons, that’s one of the arguments why you’ve see some advocates of a ban say that they don’t support what is sometimes called a partition treaty, which is something that would create a geographic partition that would say you could only use autonomous weapons outside of populated areas. What some advocates of a ban have said is, “Look, that’s never going to hold in combat.” That sounds good. I’ve heard some international humanitarian lawyers say that, “Oh, well, this is how we solve this problem.” But in practice, I agree that’s not likely to be very feasible.

So clarity’s important. Another factor is the relative value of, the military value of a weapon, versus its perceived horribleness. I think, again, a good case in point here is the difference in the International Committee’s success in largely getting most countries to give up chemical weapons, but the lack of success on nuclear weapons. Nuclear weapons by any reasonable measure are far more terrible in terms of their immediate and long-lasting effects on human life and the environment, but they have much more military value, at least perceived military value. So countries are much more reluctant to give them up.

So that’s another factor, and then there are some other ones that I think are fairly straightforward but also matter, things like the access to the weapon and the number of actors that are needed to get agreement. If only two countries have the technology, it’s easier to get them on board than if it’s widely available and everyone needs to agree. But I think those are some really important factors that are significant.

One of the things that actually doesn’t matter that much is the legality of a weapons treaty. I’m not saying it doesn’t matter at all, but you see plenty of examples of legally binding treaties that are violated in wartime, and you see some examples, not a ton, but some examples of mutual restraint among countries when there is no legally binding agreement or sometimes no agreement at all, no written agreement. It’s sort of a tacit agreement to refrain from certain types of competition or uses of weapons.

All of those, I think, are really important factors when you think about the likelihood of a ban actually succeeding on any weapons — not just autonomous weapons, any weapons — but the likelihood of a ban actually succeeding in wartime.

Ariel: I’m probably going to want to come back to this, but you mentioned something that reminded me of another question that I had for you. And that is, in your book, you mentioned … I don’t remember what the weapon was, but it was essentially an autonomous weapon that the military chose not to use and then ended up giving up because it was so costly, and ultimately they didn’t trust it to make the right decisions.

I’m interested in this idea of the extent to which we trust the weapons to do whatever it is that they’re tasked with if they’re in some sort of autonomous mode, and I guess where we stand today with various weapons and whether military will have increasing trust in their weapons in the future.

Paul: The case study I think you’re referring to was an anti-ship missile called the Tomahawk anti-ship missile, or TASM, that was in service by the US Navy in the 1980s. That I would classify as an autonomous weapon. It was designed to go over the horizon to attack Soviet ships, and it could fly a search pattern. I think, actually, in the book I included the graphic of the search pattern that it would fly to look for Soviet ships.

The concern was that the way this would work in anti-surface warfare is the navy would send out patrol aircraft because they’re much faster. They have much longer range than ships. And they would scout for other enemy ships. The principle in a wartime environment is patrol aircraft would find a Soviet ship and then radio back to a destroyer the Soviet ship’s location, and the destroyer would launch a missile.

Now, the problem was, by the time the missile got there, the ship would have moved. So the ship would now have what the military would call an area of uncertainty that the ship might be in. They wouldn’t have the ability to continuously track the ship, and so what they basically would do was the missile would fly a search pattern over this area of uncertainty, and when it found the ship, it would attack it.

Now, at the time in the 1980s, the technology was not particularly advanced and it wasn’t very good at discriminating between different kinds of ships. So one of the concerns was that if there happened to be another kind of ship in the area that was not an enemy combatant, it still might attack it if it was within this search pattern area. Again, it’s originally cued by a human that had some indication of something there, but there was enough uncertainty that it flies this pattern on its own. And I only for that reason call it autonomous weapon because there was a great amount of uncertainty about sort of what it might hit and whether it might do so accurately. And it could, once launched, it would sort of find and attack all on its own.

So it was never used, and there was great hesitance about it being used. I interview a retired US Navy officer who was familiar with it at the time, and he talks about that they didn’t trust that its targeting was good enough that once they let it loose, that it might hit the right target. Moreover, there was the secondary problem, which is it might hit the wrong target, sort of a false positive, if you will, but it also might miss the Soviet ship, in which case they would have simply wasted a weapons system.

That’s another problem that militaries have, which is missiles are costly. They don’t have very many of them in their inventory. Particularly if it’s something like a ship or an aircraft, there’s only so many that they can carry physically on board. So they don’t want to waste them for no good reason, which is another practical to an operational consideration. So eventually it was taken out of service for what I understand to be all of these reasons, and that’s a little bit of guesswork, I should say, as to why it was taken out of service. I don’t have any official documentation saying that, but that’s at least, I think, a reasonable assumption about some of the motivating factors based on talking to people who were familiar with it at the time.

One of the things that I think is an important dynamic that I talk about in the book, which is that, that is really an acute problem, the wasting the weapon problem for missiles that are not recoverable. You launch it, you’re not going to get it back. If the enemy’s not there, then you’ve just wasted this thing. That changes dramatically if you have a drone that can return back. Now, all of the concerns about it hitting the wrong target and civilian casualties, those still exist and those are very much on the minds of at least Western military professionals who are concerned about civilian casualties and countries that care about the rule of law more broadly.

But this issue of wasting the weapon is less of an issue when you have something that’s recoverable and you can send it out on patrol. So I think it’s possible, and this is a hypothesis, but it’s possible that as we see more drones and combat drones in particular being put into service and intended to be used in contested areas where they may have jammed communications, that we start to see that dynamic change.

To your question about trust, I guess I’d say that there is a lot of concern at least among the military professionals that I talk to in the United States and in other Allied countries, NATO countries or Australia or Japan, that there was a lot of concern about trust in these systems, and in fact, I see much more confidence … I’m going to make a broad generalization here, okay? So forgive me, but in general I would say that I see much more confidence in the technology coming from the engineers who are building them at military research labs or at defense companies, than in the military professionals in uniform who have to push the button and use them, that they’re a little bit more skeptical of wanting to actually trust these and delegate, what they see as their responsibility, to this machine.

Ariel: What do you envision, sort of if we go down current trajectories, as the future of weaponry specifically as it relates to autonomous weaponry and potentially lethal autonomous weaponry? And to what extent do you think that international agreements could change that trajectory? And maybe, even, to what extent to you think countries might possibly even appreciate having guidelines to work within?

Paul: I’ll answer that, but let me first make an observation about most of the dialogue in the space. There’s sort of two different questions wrapped up in there. What is the likely outcome of a future of autonomous weapons? Is it a good future or a bad future? And then another one is, what is the feasibility of some kind of international attention to control or regulate or limit these weapons? Is that possible or unlikely to succeed?

What I tend to hear is that people on all sides of this issue tend to cluster into two camps. They tend to either say, “Look, autonomous weapons are horrible and they’re going to cause all these terrible effects. But if we just all get together, we can ban them. All we need to do is just … I don’t know what’s wrong with countries. We need to sit down. We need to sign a treaty and we’ll get rid of these things and our problems will be solved.”

Other people in the opposite camp say, “Bans don’t work, and anyways, autonomous weapons would be great. Wouldn’t they be wonderful? They could make war so great, and humans wouldn’t make mistakes anymore, and no innocent people would be killed, and war would be safe and humane and pristine.” Those things don’t necessarily go together. So it’s entirely possible … Like if you sort of imagine a two-by-two matrix. It’s really convenient that everybody’s views fit into those boxes very harmoniously, but it may not be possible.

I suspect that, on the whole, autonomous weapons that have no human control over targeting are not likely to make war better. It’s hard for me to say that would be a better thing. I can see why militaries might want them in some instances. I think some of the claims about the military values might be overblown, but there are certainly some in situations where you can imagine they’d be valuable. I think it kind of remains to be seen how valuable and what context, but you can imagine that.

But in general, I think that humans add a lot of value to making decisions about lethal force, and we should be very hesitant to take humans away. I also am somewhat skeptical of the feasibility of actually achieving restraint on these topics. I think it’s very unlikely the way the current international dynamics are unfolding, which is largely focused on humanitarian concerns and berating countries and telling them that they are not going to build weapons that comply with international humanitarian law.

I just don’t think that’s a winning argument. I don’t think that resonates with most of the major military powers. So I think that when you look at, actually, historical attempts to ban weapons, that right now what we’re seeing is a continuation of the most recent historical playbook, which is that elements of civil society have kind of put pressure on countries to ban certain weapons for humanitarian reasons. I think it’s actually unusual when you look at the broader historical arc. Most attempts to ban weapons were driven by great powers and not by outsiders, and most of them centered on strategic concerns, concerns about someone getting an unfair military advantage, or weapons making war more challenging for militaries themselves or making life more challenging for combatants themselves.

Ariel: When you say that it was driven by powers, do you mean you’d have, say, two powerful countries and they’re each worried that the other will get an advantage, and so they agree to just ban something in advance to avoid that?

Paul: Yeah. There’s a couple time periods that kind of seem most relevant here. One would be a flurry of attempts to control weapons that came out of the Industrial Revolution around the dawn of the 20th century. These included air balloons, or basically air-delivered weapons from balloons or airplanes, submarines, poison gas, what was called fulminating projectiles. You could think of projectiles or bullets that have fire in them or are burning, or exploding bullets, sawback bayonets. There was some restraint on their use in World War I, although it wasn’t ever written down, but there seems to be a historical record of some constraint there.

That was one time period, and at the time, that was all driven by the great powers at the time. So these were generally driven by the major European powers and then Japan as Japan sort of came rising on the international stage and particularly was involved as a naval power in the naval treaties. The Washington Naval Treaty is another example of this that attempts to control a naval arms race.

And then, of course, there were a flurry of arms control treaties during the Cold War driven by the US and the USSR. Some of them were bilateral. Many of them were multilateral but driven principally by those two powers. So that’s not to say there’s anything wrong with the current models of NGOs in civil society pushing for bans, because it’s worked and it’s worked in landmines and cluster munitions. I’m not sure that the same conditions apply in this instance, in large part because in those cases, there was real humanitarian harm that was demonstrated.

So you could really, I think, fairly criticize countries for not taking action because people were being literally maimed and killed every day by landmines and cluster munitions, whereas here it’s more hypothetical, and so you see people sort of extrapolating to all sorts of possible futures and some people saying, “Well, this going to be terrible,” but other people saying, “Oh, wouldn’t it be great,” and some say it’d be wonderful.

I’m just not sure that the current playbook that some people are using, which is to sort of generate public pressure, will work when the weapons are still hypothetical. And, frankly, they sound like science fiction. There was this recent open letter that FLI was involved in, and I was sitting in the break room at CNN before doing a short bit on this and talking to someone about this. They said, “Well, what are you going on about?” I said, “Well, some AI scientists wrote a letter saying they weren’t going to build killer robots.”

I think to many people it just doesn’t sound like a near-term problem. That’s not to say that it’s not a good thing that people are leading into the issue. I think it’s great that we’re seeing people pay attention to the issue and anticipate it and not wait until it happens. But I’m also just not sure that the public sentiment to put pressure on countries will manifest. Maybe it will. It’s hard to say, but I don’t think we’ve seen it yet.

Ariel: Do you think in terms of considering this to be more near term or farther away, are military personnel also in that camp of thinking that it’s still farther away, or within militaries is it considered a more feasible technology in the near term?

Paul: I think it depends a little bit on how someone defines the problem. If they define an autonomous weapon as human-level intelligence, then I think there’s a wide agreement. Well, at least within military circles. I can’t say wide agreement. There’s probably a lot of people on the podcast who might, maybe, have varying degrees of where they think that might be in terms of listeners.

But in military circles, I think there’s a perception that that’s just not a problem in the near term at all. If what you mean is something that is relatively simple but can go over a wide area and identify targets and attack them, I think many military professionals would say that the technology is very doable today.

Ariel: Have you seen militaries striving to create that type of weaponry? Are we moving in that direction, or do you see this as something that militaries are still hesitating to move towards?

Paul: That’s a tricky question. I’ll give you my best shot at understanding the answer to that because I think it’s a really important one, and part of it is I just don’t know because there’s not great transparency in what a lot of countries are doing. I have a fairly reasonable understanding of what’s going on in the United States but much less so in other places, and certainly in countries like authoritarian regimes like Russia and China, it’s very hard to glean from the outside what they’re doing or how they’re thinking about some of these issues.

I’d say that almost all major military powers are racing forward to invest in more robotics and autonomous artificial intelligence. I think for many of them, they have not yet made a decision whether they will cross the line to weapons that actually choose their own targets, to what I would call an autonomous weapon. I think for a lot of Western countries, they would agree that there’s a meaningful line there. They might parse it in different ways.

The only two countries that have really put any public guidance out on this are the United States and the United Kingdom, and they actually define autonomous weapon in quite different ways. So it’s not clear from that to interpret sort of how they will treat that going forward. US defense leaders have said publicly on numerous occasions that their intention is to keep a human in the loop, but then they also will often caveat that and say, “Well, look. If other countries don’t, we might be forced to follow suit.”

So it’s sort of in the loop for now, but it’s not clear how long “for now” might be. I think it’s not clear to me whether countries like Russia and China even see the issue in the same light, whether they even see a line in the same place. And at least some of the public statements out of Russia, for example, talking about fully roboticized units or some Russian defense contractors claiming to have built autonomous weapons that can do targeting on their own, it would suggest that they may not even see the light in the same way.

In fairness, that is a view that I hear among some military professionals and technologists. I don’t want to say that’s the majority view, but it is at least a significant viewpoint where people will say, “Look, there’s no difference between that weapon, an autonomous weapon that can choose its own targets, and a missile today. It’s the same thing, and we’re already there.” Again, I don’t totally agree, but that is a viewpoint that’s out there.

Ariel: Do you think that the fact that countries have these differing viewpoints is a good reason to put more international pressure on developing some sort of regulations to try to bring countries in line, bring everyone onto the same page?

Paul: Yeah. I’m a huge supporter of the process that’s been going on with the United Nations. I’m frustrated, as many are, about the slowness of the progress. Part of this is a function of diplomacy, but part of this is just that they haven’t been meeting very often. When you add up all of the times over the last five years, it’s maybe five or six weeks of meetings. It’s just not very much time they spend together.

Part of it is, of course … Let’s be honest. It’s deliberate obstinacy on the part of many nations who want to slow the progress of talks. But I do think it would be beneficial if countries could come to some sort of agreement about rules of the road, about what they would see as appropriate in terms of where to go forward.

My view is that we’ve gotten the whole conversation off on the wrong foot by focusing on this question of whether or not to have a legally binding treaty, whether or not to have a ban. If this was me, that’s not how I would have framed the discussion from the get-go, because what happens is that many countries dig in their heels because they don’t want to sign to a treaty. So they’re just like they start off on a position of, “I’m opposed.” They don’t even know what they’re opposed to. They’re just opposed because they don’t want to sign a ban.

I think a better conversation to have would be to say, “Let’s talk about the role of autonomy and machines and humans in lethal decision-making in war going forward. Let’s talk about the technology. Let’s talk about what it can do, what it can’t do. Let’s talk about what humans are good at and what they’re not good at. Let’s think about the role that we want humans to play in these kinds of decisions on the battlefield. Let’s come up with a view of what we think ‘right’ looks like, and then we can figure out what kind of piece of paper we write it down on, whether it’s a piece of paper that’s legally binding or not.”

Ariel: Talking about what the technology actually is and what it can do is incredibly important, and in my next interview with Toby Walsh, we try to do just that.

Toby: I’m Toby Walsh, I’m a Scientia Professor of Artificial Intelligence at the University of New South Wales, which is in Sydney, Australia. I’m a bit of an accidental activist, in the sense that I’ve been drawn in, as a responsible scientist, to the conversation about the challenges, the opportunities, the risks that artificial intelligence pose in fighting war. And there’s many good things that AI’s going to do in terms of reducing casualties and saving lives, but equally, I’m very concerned, like many of my colleagues are, about the risks that it poses, especially when we hand over full control to computers and remove humans from the loop.

Ariel: So that will segue nicely into the first question I had for you, and that was what first got you thinking about lethal autonomous weapons? What first gave you reason for concern?

Toby: What gave me concern about the development of lethal autonomous weapons was to see prototype weapons being developed. And knowing the challenges that AI poses — we’re still a long way away from having machines that are as intelligent as humans, and knowing the limitations, and being very concerned that we were handing over control to machines that weren’t technically capable, and certainly weren’t morally capable, of making the right choices. And therefore, too, I felt a responsibility, as any scientist, that we want AI to be used for good and not for bad purposes. Unfortunately, like many technologies, it’s completely dual use. They’re pretty much the same algorithms that are going to go into your autonomous car, that are going to identify, track, and avoid pedestrians and cyclists, are going to go into autonomous drones that are going to identify combatants, track them, and kill them. It’s a very small change to turn one algorithm into the other. And we’re going to want autonomous cars, they’re going to bring great benefits to our lives, save lots of lives, give mobility to the elderly, to the young, to the disabled. So there can be great benefits for those algorithms, but equally, the same algorithms can be repositioned and used to make warfare much more terrible and much more terrifying.

Ariel: And with AI, we’ve seen some breakthroughs in recent years, just generally speaking. Do any of those give you reason to worry that lethal autonomous weapons are closer than maybe we thought they might have been five or ten years ago? Or has the trajectory been consistent?

Toby: The recent breakthroughs have to be put into the context and that they’ve been in things like games, like the game of Go, very narrow-focus task without uncertainty. The real world doesn’t interfere when you’re playing a game of Go, it’s very precise rules and very constrained actions that you need to do and things that you need to think about. And so to us it’s good to see progress in these narrow domains. We’re still not making much progress, there’s still a huge amount to be done to build machines that are as intelligent as us. But it’s not machines as intelligent as us that I’m very worried about, although that will be in 50 or 100 years time, when we have them, that will be something that we’ll have to think about then.

It’s actually stupid AI, the fact that we’re already thinking about giving responsibility to quite stupid algorithms that really cannot make the right distinctions, either in a technical sense, in terms of being able to distinguish combatants and civilians as required by international humanitarian law. And also from a moral ground, that they really can’t decide things like proportionality, they can’t make the moral distinctions that humans have. They don’t have any of the things like empathy and consciousness that allow us to make those difficult decisions that are made in the battlefield.

Ariel: If we do continue on our current path and we aren’t able to get a ban on these weapons, what concerns do you have? What do you fear will happen? Or what do you anticipate? What type of weapons?

Toby: The problem is, I think with the debate, is that people try and conflate the concerns that we have into just one concern. And there’s different concerns at different points in time and different developments of the technology.

So the concerns I have in the next 10 years or so are definitely concerns I would have in 50 years time. Now the concerns I would have in the next 10 years or so is largely around incompetence. The machines would not be capable of making the right distinctions. And later on, there are concerns that come, as the machines become more competent, different concerns. They would actually now change the speed, the duration, the accuracy of war. And they would be very terrible weapons that any ethical safeguards that we could, at that point, build in, might be removed by bad actors. Sadly, plenty of bad actors out there who would be willing to remove any of the ethical safeguards that we might build in. So there’s not one concern. I think, unfortunately, when you hear the discussion, often it’s people try and distill it down to just a single concern at a single point in time. And depending on the state of the technology, there are different concerns as the technology gets more sophisticated and more mature. But it’s only to begin with, I would be very concerned that we will introduce a rather stupid algorithm into battlefield and they couldn’t make the right moral and right technical distinctions that are required until IHL.

Ariel: Have you been keeping track at all of what sorts of developments have been coming out of different countries?

Toby: You can see, if you just go into YouTube, you can see there are prototype weapons. Pretty much in every theater of battle — in the air, there are autonomous drones and PA systems have autonomous drones that’s now been under development for a number of years. And on the sea, the US Navy’s launched, more than a year ago now, it’s first fully autonomous ship. And interestingly, when it was launched, they said it would just have defensive measures that we can use, hunting for mines, hunting for submarines, and now they’re talking about putting weapons on it. Under the sea, we have an autonomous submarine, an autonomous submarine the size of a bus that’s believed to be halfway across the Pacific, fully autonomously. And on land there are a number of different autonomous weapons. Certainly there are prototypes of autonomous tanks, autonomous sentry robots, and the like. So there is a bit of an arms race happening and it’s certainly very worrying to see that we’re sort of locked into one of these bad equilibria, where everyone is racing to develop these weapons, in part just because the other side is.

China is definitely one of the countries to be worried about. It’s made very clear its ambitions to seek economic military dominance through the use, in large part, in technologies like artificial intelligence and it’s investing very heavily to do that. The military and commercial companies are very tightly close together. It will give it quite a unique position, perhaps even some technical advantages to the development of AI, especially in the battlefield. So it was quite surprising, all of us at the UN meeting in April were pretty surprised when China came out and called for a ban on the deployment of autonomous weapons. It didn’t say anything about development of autonomous weapons, so that’s probably not as far as I would like countries to go because if they’re developed, then you still run the risk that they will be used, accidentally or otherwise. The world is still not as safe as if they’re not actually out there with their triggers waiting to go. But it’s interesting to see that they made that call. It’s hard to know whether they’re just being disruptive or whether they really do see the serious concern we have.

I’ve talked to my colleagues, academic researchers in China around, and they’ve been, certainly in private, sympathetic to the cause of regulating autonomous weapons. Of course, unfortunately, China is a country in which it’s not possible, in many respects, to talk freely. And so they’ve made it very clear that it would be a career-killing move for them, perhaps, to speak publicly like scientists in the West have done about these issues. Nevertheless, we have drawn signatures from Hong Kong, where it is possible to speak a bit more freely, which I think demonstrates that, within the scientific community internationally, across nations, there is actually broad support for these sorts of actions. But the local politics may prevent scientists from speaking out in their home country.

Ariel: A lot of the discussion around lethal autonomous weapons focuses on the humanitarian impact, but I was wondering if you could speak at all to the potential destabilizing effect that they could have for countries?

Toby: One of the aspects of autonomous weapons that I don’t think is discussed enough is quite how destabilizing they will be as a technology. They will be relatively easy, certainly cheap to get your hands on. As I was saying when I was in Korea most recently to the Koreans, the presence of autonomous weapons would make South Korea even less safe than it is today. A country like North Korea has demonstrated it’s willing to go to great lengths to attain atomic weapons. And it would be much easier for them to obtain autonomous weapons and that would put South Korea in a very difficult situation because if they were attacked by autonomous weapons and they weren’t able to defend themselves adequately, then that would escalate and we might well find ourselves in a nuclear conflict. One that, of course, none of us would like to see. So they will be rather destabilizing, like the weapons that fall into the wrong hands, they’ll be used not just by the superpowers, they’ll be used by smaller nations, even rogue states. Potentially, they might even be used by terrorist organizations.

And then another final aspect that makes them very destabilizing is one of attribution. If someone attacks you with autonomous weapons, then it’s going to be very hard to know who’s attacked you. It’s not like you can bring one of the weapons down, you can open it up and look inside it. It’s not going to tell you who launched it. There’s not a radio signal you can follow back to a base to find out who’s actually controlling this. So it’s going to be very hard to work out who’s attacking you and the countries will deny, vehemently, that it’s them, even if they went and attacked you. So they will be perfect weapons of terror, perfect weapons for troubling nations to do their troubling with.

One other concern that I have as a scientist is the risk of the field receiving a bad reputation by the misuse of the technology. We’ve seen this in areas like genetically modified crops. The great benefits that we might have had by that technology — making crops more disease-resistant, more climate-resistant, and that we need, in fact, to deal with the pressing problems that climate change and growing population’s put on our planet — have been negated by the fact that people were distrustful of the technology. And we run a similar sort of risk, I think, with artificial intelligence. That if people see the AI being used to fight terrible wars and to be used against civilians and other people, that the technology will have a stain on it. And all the many good uses and the great potential of the technology might be at risk because people will turn against all sorts of developments of artificial intelligence. And so that’s another risk and another reason many of my colleagues feel that we have to speak out very vocally to ensure that we get the benefits and that the public doesn’t turn against the whole idea of AI being used to improve the planet.

Ariel: Can you talk about the different between an AI weapon and an autonomous weapon?

Toby: Sure. There’s plenty of good things that the military can use artificial intelligence for. In fact, the U.S. military has historically been one of the greatest funders of AI research. There’s lots of good things you can use artificial intelligence for, in the battlefield and elsewhere. No one should risk a life or limb clearing a minefield, a perfect job for a robot because it could go rogue and blow up the robot and you can replace the robot easily. Equally, filtering through all the information coming at you, making sure that you can work out who are combatants and who are civilians, using AI to help you in a situation, once again, that’s a perfect job that will actually save lives, stop some of the mistakes that inevitably happen in the fog of war. And in lots of other areas in logistics and so on, there’s lots of good things in humanitarian aid that AI will be used for.

So I’m not against the use of AI in militaries, I think I can see great potential for it to save lives, to make war a little less dangerous. But there is a complete difference when we look at removing humans completely from the decision loop in a weapon and ending up with a fully autonomous weapon where it is the machine that is making the final decision as to who lives and who dies. And as I said before, that raises many technical, moral, and legal questions that we shouldn’t go down that line. And ultimately, I think there’s a very big moral argument, which is that we shouldn’t hand over those sorts of decisions, that would be taking us into a completely new moral territory that we’ve never seen before in our lives. Warfare is a terrible thing and we sanction it, and in part because we’re risking our own lives and it should be a matter of last resort, not something that we hand over easily to machines.

Ariel: Is there anything else that you think we should talk about?

Toby: I think we’d want to talk about whether regulating autonomous weapons, regulating AI, would hinder the benefits for peaceful or non-military uses. I’m very unconcerned, as many of my colleagues, that if we regulate autonomous weapons that that will actually hinder the development, in any way at all, of the peaceful and the good uses of AI. In fact, as I had mentioned earlier, I’m actually much more fearful that if we don’t regulate, there will be a backlash against the technology as a whole and that will actually hinder the good uses of AI. So I’m completely unconcerned, just like the bans on chemical weapons have not held back chemistry, the bans on biological weapons have not held back biology, the bans on nuclear weapons have not held back the development of peaceful uses of nuclear power. So I’m completely unconcerned, as many of my colleagues are, that regulating autonomous weapons will actually hold back the field in any way at all, in fact quite the opposite.

Ariel: Regulations for lethal autonomous weapons will be more effective if the debate is framed in a more meaningful way, so I’m happy Richard Moyes could talk about how the concept of meaningful human control has helped move the debate in a more focused direction.

Richard: I’m Richard Moyes, and I am Managing Director of Article 36, which is a non-governmental organization which focuses on issues of weapons policy and weapons law internationally.

Ariel: To start, you have done a lot of work, I think you’re credited with coining the phrase “meaningful human control.” So I was hoping you could talk a little bit about first, what are some of the complications around defining whether or not a human is involved and in control, and maybe if you could explain some of the human in the loop and on the loop ideas a little bit.

Richard: We developed and started using the term meaningful human control really as an effort to try and get the debate on autonomous weapons focused on the human element, the form and nature of human engagement that we want to retain as autonomy develops in different aspects of weapons function. First of all, that’s a term that’s designed to try and structure the debate towards thinking about that human element.

I suppose, the most simple question that we raised early on when proposing this term was really a recognition that I think everybody realizes that some form of human control would be needed over new weapon technologies. Nobody is really proposing weapon systems that operate without any human control whatsoever. At the same time, I think people could also recognize that simply having a human being pressing a button when they’re told to do so by a computer screen, without really having any understanding of what the situation is that they’re responding to, having a human simply pressing a button without understanding of the context, also doesn’t really involve human control. So even though in that latter situation, you might have a human in the loop, as that phrase goes, unless that human has some substantial understanding of what the context is and what the implications of their actions are, then simply a pro forma human engagement doesn’t seem sufficient either.

So, in a way, the term meaningful human control was put forward as a way of shifting the debate onto that human element, but also putting on the table this question of, well, what’s the quality of human engagement that we really need to see in these interactions in order to feel that our humanity is being retained in the use of force.

Ariel: Has that been successful in helping to frame the debate?

Richard: I think this sort of terminology, of course, different actors use different terms. Some people talk about necessary human control, or sufficient human control, or necessary human judgment. There’s different word choices there. I think there are pros and cons to those different choices, but we don’t tend to get too hung up on the specific wording that’s chosen there. The key thing is that these are seen bundled together as being a critical area now for discussion among states and other actors in multilateral diplomatic conversation about where the limits of autonomy in weapon systems lie.

I think coming out of the Group of Governmental Experts meeting of the Convention on Conventional Weapons that took place earlier this year, I think the conclusion of that meeting was more or less that this human element really does now need to be the focus of discussion and negotiation. So one way or another, I think the debate has shifted quite effectively onto this issue of the human element.

Ariel: What are you hoping for in this upcoming meeting?

Richard: Perhaps what I’m hoping for and what we’re going to get, or what we’re likely to get, might be rather different things. I would say I’d be hoping for states to start to put forward more substantial elaborations of what they consider the necessary human control, human element in the use of force to be. More substance on that policy side would be a helpful start, to give us material where we can start to see the differences and the similarities in states’ positions.

However, I suspect that the meeting in August is going to focus mainly on procedural issues around the adoption of the chair’s report, and the framing of what’s called the mandate for future work of the Group of Governmental Experts. That probably means that, rather than so much focus on the substance, we’re going to hear a lot of procedural talk in the room.

That said, in the margins, I think there’s still a very good opportunity for us to start to build confidence and a sense of partnership amongst states and non-governmental organizations and other actors who are keen to work towards the negotiation of an instrument on autonomous weapon systems. I think building that partnership between sort of progressive states and civil society actors and perhaps others from the corporate sector, building that partnership is going to be critical to developing a political dynamic for the period ahead.

Ariel: I’d like to go back, quickly, to this idea of human control. A while back, I talked with Heather Roff, and she gave this example, I think it was the empty hanger problem. Essentially what it is is no one expects some military leader to walk down to the airplane hangar and discover that the planes have all gone off to war without anyone saying something.

I think that gets at some of the confusion as to what human control looks like. You’d mentioned briefly the idea that a computer tells a human to push a button, and the human does that, but even in fully autonomous weapon systems, I think there would still be humans somewhere in the picture. So I was wondering if you could elaborate a little bit more on maybe some specifics of what it looks like for a human to have control or maybe where it starts to get fuzzy.

Richard: I think that we recognize that in the development of weapon technologies, already we see significant levels of automation, and a degree of handing over certain functions to sensors and to assistance from algorithms and the like. There are a number of areas that I think are of particular concern to us. I think, in a way, this is to recognize that a commander needs to have a sufficient contextual understanding of where it is that actual applications of force are likely to occur.

Already, we have weapon systems that might be projected over a relatively small area, and within that area, they will identify the heat shape of an armored fighting vehicle for example, and they may direct force against that object. That’s relatively accepted in current practice, but I think it’s accepted so long as we recognize that the area over which any application of force may occur is actually relatively bounded, and it’s occurring relatively shortly after a commander has initiated that mission.

Where I think my concerns, our concerns, lie is that that model of operation could be expanded over a greater area of space on the ground, and over a longer period of time. As that period of time and that area of space on the ground increase, then the ability of a commander to actually make an informed assessment about the likely implications of the specific applications of force that take place within that envelope becomes significantly diluted, to the point of being more or less meaningless.

For us, this is linked also to the concept of attacks as a term in international law. There’s a legal obligation that bears on human commanders at their unit of the attack, so there are certain legal obligations that a human has to fulfill for an attack. Now an attack doesn’t mean firing one bullet. An attack could retain a number of applications of actual force, but it seems to us that if you simply expand the space and the time over which an individual weapon systems can identify target objects for itself, ultimately you’re eroding that notion of an attack, which is actually a fundamental building block of the structure of the law. You’re diluting that legal framework to the point of it arguably being meaningless.

We want to see a reasonably constrained period of, say, let’s call it independence of operation for a system, it may not be fully independent, but where a commander has the ability to sufficiently understand the contextual parameters within which that operation is occurring.

Ariel: Can you speak at all, since you live in the UK, on what the UK stance is on autonomous weapons right now?

Richard: I would say the UK has, so far, been a somewhat reluctant dance partner on the issue of autonomous weapons. I do see some, I think, positive signs of movement in the UK’s policy articulations recently. One of the main problems they’ve had in the past is that they adopted a definition of lethal autonomous weapon systems, which is the terminology used in the CCW. It’s undetermined what this term lethal autonomous weapon systems means. That’s a sort of moving target in the debate, which makes the discussion quite complicated.

But the UK adopted a definition of that term which was somewhat in the realm of science fiction as far as we’re concerned. They describe lethal autonomous weapon systems as having the ability to understand a commander’s intent. I think, in doing so, they were suggesting an almost human-like intelligence within the system, which is a long way away, if even possible. It’s certainly a long way away from where we are now, and where already developments of autonomy in weapon systems are causing legal and practical management problems. By adopting that sort of futuristic definition, they a little bit ruled themselves out of being able to make constructive contributions to the actual debate about how much human control should there be in the use of force.

Now recently in certain publications, the UK has slightly opened up some space to recognize that that definition might actually not be so helpful, and maybe this focus on the human control element that needs to be retained is actually the most productive way forward. Now how positive the UK will be, from my perspective, in that discussion, and then talking about the level of human control that needs to be retained? I think that remains to be seen, but I think at least they’re engaging with some recognition that that’s the area where there needs to be more policy substance. So finger’s crossed.

Ariel: I’d asked Richard about the UK’s stance on autonomous weapons, but this is a global issue. I turned to Mary Wareham and Bonnie Docherty for more in-depth information about international efforts at the United Nations to ban lethal autonomous weapons.

Bonnie: My name’s Bonnie Docherty. I’m a senior researcher at Human Rights Watch, and also the director of Armed Conflict and Civilian Protection at Harvard Law School’s International Human Rights Clinic. I’ve been working on fully autonomous weapons since the beginning of the campaign doing most of the research and writing regarding the issue for Human Rights Watch and Harvard.

Mary: This is Mary Wareham. I’m the advocacy director of the Arms Division at Human Rights Watch. I serve as the global coordinator of the Campaign to Stop Killer Robots. This is the coalition of non-governmental organizations that we co-founded towards the end of 2012 and launched in April 2013.

Ariel: What prompted the formation of the Campaign to Stop Killer Robots?

Bonnie: Well, Human Rights Watch picked up this issue, we published our first report in 2012. Our concern was the development of this new technology that raised a host of concerns, legal concerns, compliance with international and humanitarian law and human rights law, moral concerns, accountability concerns, scientific concerns and so forth. We launched a report that was an initial foray into the issues, trying to preempt the development of these weapons before they came into existence because the genie’s out of the bottle, it’s hard to put it back in, hard to get countries to give up a new technology.

Mary: Maybe I can follow up there just to establish the Campaign to Stop Killer Robots. I did a lot of leg work in 2011, 2012 talking to a lot of the people that Bonnie was talking to for the preparation of the report. My questions were more about what should we do once we launch this report? Do you share the same concerns that we have at Human Rights Watch, and, if so, is there a need for a coordinated international civil society coalition to organize us going forward and to present a united voice and position to governments who we want to take action on this? For us, working that way in a coalition with other non-governmental organizations is what we do. We’ve been doing it for the two last decades on other humanitarian disarmament issues, the International Campaign to Ban Landmines, the Cluster Munition Coalition. We find it’s more effective when we all try to work together and provide a coordinated civil society voice. There was strong interest, and therefore, we co-founded the Campaign to Stop Killer Robots.

Ariel: What prompted you to consider a ban versus your trying to … I guess I don’t know other options there might have been.

Bonnie: We felt from the beginning that what was needed to address fully autonomous weapons is a preemptive ban on development, production and use. Some people have argued that existing law is adequate. Some people have argued you only need to regulate it, to limit it to certain circumstances, but in our mind a ban is essential, and that draws on past work on other conventional weapons such as landmines and cluster munitions, and more recently nuclear weapons.

The reason for a ban is that if you allow these weapons to exist, even to come into being, to be in countries’ arsenals, they will inevitably get in the hands of dictators or rogue actors that will use them against the law and against the rules of morality. They will harm combatants as well as civilians. It’s impossible once a weapon exists to restrict it to a certain circumstance. I think those who favor regulation assume the user will follow all the rules, and that’s just not the way it happens. We believe it should be preemptive because once they come into existence it’s too late. They will be harder to control, and so if you prevent them from even happening that will be the most effective solution.

The last point I’d make is that it also increases the stigma against the weapons, which can influence even countries that aren’t party to a treaty banning them. This is proven in past weapons treaties, and even there’s been a preemptive ban on blinding lasers in the 1990s, and that’s been very effective. There is legal precedent for this, and many arguments for why a ban is the best solution.

Mary: Yeah, there’s two ways of framing that call, which is not just the call of Human Rights Watch, but the call of the Campaign to Stop Killer Robots. We seek a preemptive ban on the development, production and use of fully autonomous weapons. That’s a kind of negative way of framing it. The positive way is that we want to retain meaningful human control over the use of force and over weapons systems going forward. There’s a lot of interest, and I’d say convergence on those two points.

We’re five years on since the launch of the campaign, 26 countries are now supporting the call for a ban and actively trying to get us there, and an even larger number of countries, actually, virtually all of the ones who’ve spoken to-date on this topic, acknowledge the need for some form of human control over the use of force and over weapons systems going forward. It’s been interesting to see in the five diplomatic meetings that governments have held on this topic since May 2014, the discussions keep returning to the notion of human control and the role of the human and how we can retain that going forward because autonomy and artificial intelligence are going to be used by militaries. What we want to do, though, is draw a normative line and provide some guidance and a framework going forward that we can work with.

Ariel: You just referred to them as fully autonomous weapons. At FLI we usually talk about lethal autonomous weapons versus non-lethal fully autonomous weapons, and so that sort of drives me to the question of, to what extent do definitions matter?

Then, this is probably a completely different question, how are lethal autonomous weapons different from conventional weapons? The reason I’m combining these two questions is because I’m guessing definition does play a little bit of a role there, but I’m not sure.

Bonnie: Well, it’s important for countries to make international law they have to have a general, common understanding of what we’re talking about. Generally, in a legal treaty the last thing to be articulated is the actual definition. It’s premature to get a detailed, technical definition, but we feel that, although a variety of names have been used, lethal autonomous weapon systems, fully autonomous weapons, killer robots, in essence they’re all talking about the same thing. They’re all talking about a system that can select a target and choose to fire on that target without meaningful human control. There’s already convergence around this definition, even if it hasn’t been defined in detail. In terms of conventional munitions, they are, in essence, a conventional munition if they deploy conventional weapons. It depends on what the payload is. If a fully autonomous system were launching nuclear weapons it would not be a conventional weapon. If it’s launching cluster munitions it would be a conventional. It’s not right to say they’re not conventional weapons.

Mary: The talks are being held at the Convention on Conventional Weapons in Geneva. This is where governments decided to house this topic. I think it’s natural for people to want to talk about definitions. From the beginning that’s what you do with a new topic, right? You try and figure out the boundaries of what you’re discussing here. Those talks in Geneva and the reporting that has been done to date and all of the discourse, I think it’s been pretty clear that this campaign and this focus on fully autonomous weapons is about kinetic weapons. It’s not about cyber, per se, it’s about actual things that can kill people physically.

I think the ICRC, the Red Cross, has made it an important contribution with its suggestion to focus on the critical functions of weapons systems, which is what we were doing in the campaign, we just weren’t calling it that. That’s this action of identifying and selecting a target, and then firing on it, using force, lethal or otherwise. Those are the two functions that we want to ensure remain under human control, under meaningful human control.

For some others, some other states, they like to draw what we call the very wide definition of meaningful human control. For some of them it means good programming, nice design, a weapons review, a kind of legal review of if the weapon system will be legal and if they can proceed to develop it. You could kind of cast a very wide loop when you’re talking about meaningful human control, but for us the crux of the whole thing is about this notion of selecting targets and firing on them.

Ariel: What are the concerns that you have about this idea of non-human control? What worries you about that?

Mary: Of autonomy in weapon systems?

Ariel: Yeah, essentially, yes.

Mary: We’ve articulated legal concerns here at Human Rights Watch just because that’s where we always start, and that’s Bonnie’s area of expertise, but there are much broader concerns here that we’re also worried about, too. This notion of crossing a moral line and permitting a machine to take human life on the battlefield or in policing or in border control and other circumstances, that’s abhorrent, and that’s something that the Nobel Peace Laureates, the faith leaders and the others involved in the Campaign to Stop Killer Robots want to prevent. For them that’s a step too far.

They also worry about outsourcing killing to machines. Where’s the ethics in that? Then, what impact is this going to have on the system that we have in place globally? How will it be destabilizing in various regions, and, as a whole, what will happen when dictators and one-party states and military regimes get ahold of fully autonomous weapons? How will they use them? How will non-state armed groups use them?

Bonnie: I would just add, building on what Mary said, another reason human control is so important is that humans bring judgment. They bring legal and ethical judgment based on their innate characteristics, on their understanding of another human being, of the mores of a culture, and that a robot cannot bring, certain things cannot be programmed. For example, when they’re weighing whether the military advantage will justify an attack if it causes civilian harm, they apply that judgment, which is both legal and ethical. A robot won’t have that, that’s a human thing. Losing humanity in use of force, potentially, violate the law, and as well as raise serious moral concerns that Mary discussed.

Ariel: I want to go back to the process to get these weapons banned. It’s been going on for quite a few years now. I was curious, is that slow, or is that just sort of the normal speed for banning a weapon?

Mary: Look at nuclear weapons, Ariel.

Ariel: Yeah, that’s a good point. That took a while.

Mary: That took so many years, you know? That’s the example that we’re trying to avoid here. We don’t want to be negotiating a non-proliferation treaty in 20 years time with the small number of countries who’ve got these and the other states who don’t. We’re at a crossroads here. Sorry to interrupt you.

Ariel: No, that was a good point.

Mary: There have been five meetings on this topic to date at the United Nations in Geneva, but each of those meetings has only been up to a week long, so, really, it’s only five weeks of talks that have happened in the last four years. That’s not much time to make a lot of progress to get everybody around the same table understanding, but I think there’s definitely been some progress in those talks to delineate the parameters of this issue, to explore it and begin to pull apart the notion of human control and how you can ensure that that’s retained in weapons systems in the selection of targets and the use of force. There’s a wide range of different levels of knowledge on this issue, not just in civil society and academia and in the public, but also within governments.

There’s a lot of leg work to be done there to increase the awareness, but also the confidence of governments to feel like they can deal with this. What’s happened, especially I think in the past year, has been increased calls to now move from exploring the issue and talking about the parameters of the challenge to, “What are we good do about it?” That’s going to be the big debate at the next meeting, which is coming up at the end of August, is what will the recommendation be for future work? Are the governments going to keep talking about this, which we hope they do, but what are they going to do about it, more importantly?

We’re seeing, I think, a groundswell of support now for moving towards an outcome. States realize that they do not have the time or the money to waste on inconclusive deliberations, and so they met to be exploring options on pathways forward, but there’s really not that many options. As has been mentioned, states can talk about international law and the existing rules and how they can apply them and have more transparency there, but I think we’ve moved beyond that.

There’s kind of a couple of possibilities which will be debated. One is political measures, political non-binding declaration. Can we get agreement on some form of principles over human control? That sounds good, but it doesn’t go nearly far enough. We could create new international law. How do we do that in this particular treaty at the Convention on Conventional Weapons? You move to a negotiating mandate, and you set the objective of negotiating a new protocol under the Convention on Conventional Weapons. At the moment, there has been no agreement to move to negotiate new international law, but we’re expecting that to be the main topic of debate at the next meeting because they have to decide now what they’re going to do next year.

For us, the biggest, I think, developments are happening outside of the room right now rather than in Geneva itself. There’s a lot of activity now starting to happen in national capitols by governments to try and figure out what their position is on this, what their policy is on this, but there’s more prodding and questioning and debate starting to happen in national parliaments, and that has to happen in order to determine what the government position is on this and what’s going to happen on it. Then we have the examples of the open letters, the sign-on letters, ethical principles, there’s all sorts of new things that are coming out in recent weeks that I think will be relevant to what the governments are discussing, and we hope will provide them with impetus to move forward with focus and purpose here.

We can’t put a timeline on by when they might create a new international treaty, but we’re saying you can do this quickly if you put your mind to it and you say that this is what you want to try and achieve. We believe that if they move to a negotiating mandate at the end of this year, they could negotiate the treaty next year. Negotiating the treaty is not the part that takes the long time. It’s about getting everybody into the position where they want to create new international law. The actual process of negotiating that law should be relatively swift. If it takes longer than a year or two, then it runs the risk of turning into another set of inconclusive deliberations that don’t produce anything. For us, the goal is absolutely crucial to get in there at the beginning. The goal at the moment has gone from informal talks to formal talks, but, still, with no option or outcome.

Ariel: What is some of the resistance that you’re facing to moving towards a ban? Are governments worried that they’re going to miss out on a great technology, or is there some other reason that they’re resisting?

Mary: Just to say, 85 countries have spoken out on this topic to date. Most of them not at any great length, but just to say, “This is important. We’re concerned. We support the international talks.” We have a majority of countries now who want to move towards negotiating new international law. Who’s the blockages at the moment? At the last round of talks and at the previous ones it was basically Israel, Russia and the United States who were saying it’s premature to decide where these talks should lead. We need to further explore and discuss the issues before we can make any progress. For others, now people are less patient with that position, and it will be interesting to see if those three countries in particular change their minds here.

The particular treaty that we’re at, the Convention on Conventional Weapons, the states there take their decisions by consensus, which means they can’t vote. There’s no voting procedures there. They have to strive for consensus where everybody in the room agrees, or at least does not object with moving forward. That threat of a kind of a blocking of consensus is always there, especially from Russia, but we’ll see. There’s no kind of pro-killer robot state which is saying, “We want these things. We need these things,” right now, at least not in the diplomatic talks. The only countries who have wanted to talk about the potential advantages or benefits are Israel and the United States. All of the other countries who speak about this are more concerned about understanding and coming to grips with all of the challenges that are raised, and then figuring out what the regulatory framework should be.

Ariel: Bonnie, was there anything you wanted to add to that?

Bonnie: I think Mary summarized the key points. I was just going to say that there’s some people who would argue that we should wait and see what the technology would bring, we don’t know where it’ll go. Our argument counter to that is something called the precautionary principle, that even if there’s scientific uncertainty about where a technology will go, if there’s a significant risk of public harm, which there is in this case, that the scientific uncertainty should not stand in the way of action. I think that the growing number of states that have expressed concern about these weapons, and the majority, the almost consensus or the merging around the need for human control show that there is willingness to act at this point. As Mary said, this is not a situation where people are advocating, and I think that in the long run the agreement that there should be human control over the use of force will outweigh any hesitation based on the wait-and-see approach.

Mary: We had a good proposal, or not proposal, but offer from the United Nations Secretary General in this big agenda for disarmament framework that he launched a couple of months ago, saying that he stands ready to support the efforts of UN member states to elaborate new measures on lethal autonomous weapon systems, including legally-binding arrangements. For him, he wants states to ensure that humans remain at all times in control over the use of force. To have that kind of offer of support from the highest level at the United Nations I think is very important.

The other recent pledges and commitments, the one by the 200 technology companies and more than 2600 scientists and AI experts and other individuals committing not to develop lethal autonomous weapons systems, that’s a very powerful message, I think, to the states that these groups and individuals are not going to wait for the regulation. They’re committing not to do it, and this is what they expect the governments to do as well. We also saw the ethical principles issued by Google in recent weeks and this pledge by the company not to design or develop artificial intelligence for use in weapons. All of these efforts and initiatives are very relevant to what states need to do going forward. This is why we in the Campaign to Stop Killer Robots welcome them and encourage them, and want to ensure that we have as much of a broad-based appeal to support the government action that we need taken.

Ariel: Can you talk a little bit about what’s happening with China? Because they’ve sort of supported a ban. They’re listed as supporting a ban, but it’s complicated.

Mary: It’s funny because so many other countries that have come forward and endorsed the call for a ban have not elicited the same amount of attention. I guess it’s obviously interesting, though, for China to do this because everybody knows about the investments that China is making into military applications of artificial intelligence and autonomy. We see the weapons systems that are in development at the moment, including swarms of very small miniature drones, and where will that head?

What China thinks about this issue matters. At the last meeting, China basically endorsed the call for a ban, but said — there’s always a but — that their support was limited to prohibiting use only, and to not address development or production. For us it’s a partial ban, but we put them on the list that the campaign maintains, and they’re the first state to have an asterisk by its entry saying, “Look, China is on the ban list, but it’s not fully committed here.” We needed to acknowledge that because it wasn’t really the first that China had hinted it would support creating new international law. It has been hinting at this in previous papers, including one that found that China’s review of existing international law found so many questions and doubts raised that it does see a need to create international law specific to fully autonomous weapons systems. China gave the example of the blinding lasers protocol at the CCW which prohibits laser weapons that would permanently blind human soldiers.

I think the real news on China is that its position now saying that existing law is insufficient and we need to create new international rules, splits the P5, the permanent five members of the United Nations Security Council. You have Russia and the United States arguing that it’s too early to determine what the outcome should be, and the UK — Richard can explain better exactly what the UK wants — but it seems to be satisfied with the status quo. Then France is pursuing a political declaration, but not legally-binding measures. There’s not unity anymore in that group of five permanent members of the Security Council, and those states do matter because they are some of the ones who are best-placed to be developing and investing in increasingly autonomous weapons systems.

Ariel: Okay. I wanted to also ask, unrelated, right now what you’re trying to do, what we’re trying to do, is get a ban, a preemptive ban on a weapon that doesn’t exist. What are some examples in the past of that having succeeded, as opposed to proving some humanitarian disaster as the result of a weapon?

Bonnie: Well, the main precedent for that is the preemptive ban on blinding lasers, which is a protocol to the Convention on Conventional Weapons. We did some research a few years ago into the motives behind the preemptive ban on blinding lasers, and many of them are the same. They raised concerns about the ethics of permanently blinding someone, whether it’s a combatant or a civilian. They raised concerns about the threat of an arms race. They raised concerns that there be a ban, but that it not impede peaceful development in that area. That ban has been very successful. It has not impeded the peaceful use of lasers for many civilian purposes, but it has created a stigma against and a legally-binding ruling against using blinding lasers. We think that that’s an excellent model for fully autonomous weapons, and it also appeared in the same treaty at which these fully autonomous weapons or lethal autonomous weapon systems are being discussed right now. It’s a good model to look at.

Mary: Bonnie, I really like that paper that you did on the other precedents for retaining human control over weapons systems. The notion that looking at past weapons that have been prohibited and finding that, in many instances, it’s because of the uncontrollable effects that the weapons create, from chemical weapons and biological and toxin ones to antipersonnel landmines where, once deployed, you cannot control them anymore. This is the kind of notion of being able to control the weapon system once it’s activated that has driven those previous negotiations, right?

Bonnie: Correct. There’s precedent for both a preemptive ban, but there’s also precedent for a desire to maintain human control over weapons. As Mary said, there are several treaties, chemical weapons, biological weapons and landmines, all have been banned, in large part because people in governments were concerned about losing control over the weapons system. In essence, it’s the same model here, that by launching fully autonomous weapons you’d be losing control over the use of force. I think there’s a precedent for a ban, and there’s a precedent for a preemptive ban, all of which are applicable in this situation.

Ariel: I talked to Paul Scharre a little bit earlier, and one of the things that he talked about were treaties that were developed as a result of the powers that be, recognizing that the weapon would be too big of a risk for them, and so they agreed to ban a weapon. Then, the other sort of driving force for treaties was usually civil societies and based on sort of the general public saying, “This is not okay.” What role do you see for both of those situations here?

Bonnie: There’s a multitude of reasons of why these weapons should be banned, and I think both the ones you mentioned are valid in this case. From our point of view, the main concern is a humanitarian one, and that’s civil society’s focus. We’re concerned about the risk to civilians. We’re concerned about moral issues, and those matters. That builds on past, what they call humanitarian disarmament treaties, treaties designed to protect humanity through legal norms, and, traditionally, often through bans, bans of landmines, cluster munitions and nuclear weapons.

There have been other treaties, sometimes they overlap, that have been driven more for security reasons. Countries that are concerned about other nations getting their hands on these weapons, and that they feel in the long run it’s better for no one to have them than for others to have them. Certainly, chemical weapons was an example of that. This does not mean that a treaty can’t be motivated for both reasons. That often happens, and I think both reasons are applicable here, but they just have come from slightly different trajectories.

Mary: It’s pretty amazing some of the diplomatic talks that we’ve been on on killer robots where we hear the governments debating the ethics of whether or not a specific weapon system such as fully autonomous weapons should be permitted, should be allowed. It’s rare that that happens. Normally, we are dealing with the aftermath of the consequences of proliferation and of widespread use and widespread production and stockpiling. This is an opportunity to do something in advance here, and it does kind of lead to a little bit of, I’d say, a North-South divide between the kind of military powers who have the resources at their disposal to invest in increasingly autonomous technology and try and push the boundaries, and then the vast majority of countries who are asking, “What’s the point of all of this? Where is the relevance of the UN charter which talks about general and complete disarmament as being the ultimate objective?” They ask, “Have we lost that goal here? Is the ultimate objective to create more and better and more sophisticated weapons systems, or is to end war and deal with the consequences through disarmament of warfare?”

Those are kind of really big-picture questions that are raised in this debate, and ones that we leave to those governments to make, but I think it is indicative of why there is so much interest in this particular concern, and that’s demonstrated by just the sheer number of governments who are participating in the international talks. The international talks, they’re in the setting called a Group of Governmental Experts, but this is not about a dozen guys sitting around the table in a small room. This is a big plenary meeting with more than 80 countries following, engaging, and avidly trying to figure out what to do.

Ariel: In terms of just helping people understand how the UN works, what role does a group like the Campaign to Stop Killer Robots play in the upcoming meeting? If, ultimately, the decision is made by the states and the nations, what is your role?

Mary: Our role is 24/7, all year round. These international meetings only happen a couple of times a year. This will be the second week this year. Most of our work has been this year happening in capitols and in places outside of the diplomatic meetings because that’s where you really make progress, is through the parliamentary initiatives, through reaching the high-level political leadership, through engaging the public, through talking to the media and getting an increased awareness about the challenges here and the need for action. All of those things are what makes things move inside the room with the diplomacy because the diplomats need instructions from capitols in order to really progress.

At the meeting itself, we seek to provide a diverse delegation that’s not just people from Europe and North America, but from around the world because this is a multilateral meeting. We need to ensure that we can reach out and engage with all of the delegates in the room because every country matters on this issue, and every country has questions. Can we answer all those questions? Probably not, but we can talk through them with those states, try and address the concerns, and try and be a valued partner in the deliberations that are happening. It’s the normal way of working for us here at Human Rights Watch, is to work alongside other organizations through coordinated civil society initiatives so that you don’t go to the meeting and have like 50 statements from different NGOs. You have just a few, or just one so that you can be absolutely clear and guiding where you want to see the deliberations go and the outcome that you want.

We’ll be holding side events and other efforts to engage with the delegates in different ways, as well as presenting new research and reports. I think you’ve got something coming out, Bonnie, right?

Bonnie: We’ll be releasing a new report on Martens Clause, which is a provision of international law, the Geneva conventions and other treaties that brings ethics into law. It basically has two prongs, which we’ll elaborate on in the report, but talking about that countries must comply with the principles of humanity and the dictates of public conscience, which, in short, we believe fully autonomous weapons raise concerns over both of those. We believe losing human control will violate basic principles of humanity, and that there’s the groundswell of opposition that’s growing among, not only governments, but also faith leaders, scientists, tech companies, academics, civil society, et cetera, all show that the public conscience is coming out against fully autonomous weapons and for maintaining human control over the use of force.

Ariel: To continue with this idea of the ethical issues surrounding lethal autonomous weapons, we’re joined now by Peter Asaro.

Peter: I’m Peter Asaro. I’m an Associate Professor in the School of Media Studies at the New School University in New York City, and I’m also the co-founder and vice chair of the International Committee for Robot Arms Control, which is part of the leadership steering committee of the Campaign to Stop Killer Robots, which is a coalition of NGOs that’s working at the UN to ban fully autonomous weapons.

Ariel: Could you tell us a little bit about how you got involved with this and what first gave you cause for concern?

Peter: My background is in philosophy and computer science, and I did a lot of work in artificial intelligence and in the philosophy of artificial intelligence as well as the history of science and early computing and the development of neural networks and the sort of mathematical and computational theories behind all of that. In the 1930s, ’40s, ’50s, and ’60s was my graduate work, and as part of that, I got really interested in the kind of modern or contemporary applications of both artificial intelligence and robotics, and specifically the kind of embodied forms of artificial intelligence, which are robotic in various ways, and got really interested in not just intelligence, but social interaction.

That sort of snowballed into thinking about robot ethics and what seems the most pressing issue within robot ethics was the use of violence, the use of force, and whether we would allow robots to kill people, and of course the first place that that was gonna happen would be the military. So, I’d been thinking a lot about the ethics of military robotics form the perspective of just war theory, but also a broad range of philosophical legal perspectives as well.

That got me involved with Noel Sharkey and some other people who were interested in this from a policy perspective and we launched the International Committee for Robot Arms Control back in 2009, and then in 2012, we got together with Human Rights Watch and a number of other NGOs to form the Campaign to Stop Killer Robots.

Ariel: That leads into the next question I have for you, and it’s very broad. Can you talk a little bit about what some of the ethical issues are surrounding robots and more specifically autonomous weapons in warfare?

Peter: I think of course there’s a whole host of ethical issues around robotics in general and privacy, safety, sort of the big ones, but all sorts of more complicated ones as well, job displacement, how we treat them, and the impacts on society and things like that. Within the military context, I think the issues are sort of clearer in some sense, because it’s mostly around the use autonomous systems in a lethal force.

So the primary question is should we allow autonomous weapons systems to make lethal decisions independently of human control or human judgment, however you frame that. And then sort of subsidiary to that, some would argue does the programming within a system constitute that kind of human control or decision making. From my perspective, pre-programming doesn’t really do that, and that’s because I come from a philosophical background and so we look at just war theory and you look at ethics, especially Kantian ethics, and the requirements for the morality of killing. So, killing is generally speaking immoral, but there are certain exceptions, and those are generally self-defense or collective self-defense in the case of war, but in order to justify that killing, you need reasons and justifications. And machines, and computational reasoning, at least at this stage of development, is not the type of system that has reasons. It follows rules and if certain conditions are met and a rule is applied and a result is obtained, but making a reasoned judgment about whether to use lethal force or whether to take a human life depends on a deeper understanding of reason, and I think that’s a sort of moral agency, it’s a moral decision making, and moral judgment that requires capacities that automated decision making systems just don’t have.

Maybe down the road in the future, machines will become conscious, machines will understand the meaning of life, machines will understand what it means to take a life, machines will be able to recognize human beings as humans who deserve rights that need to be respected, and systems may understand what it means to have a duty to respect the rights of others. But simply programming rules into machines doesn’t really do that. So, from a legal perspective as well, there’s no real accountability for these sorts of systems because they’re not legal agents, they’re not moral agents, you cannot sue a computer or a robot. You cannot charge them with crimes and put them in jail and things like that.

So, we have an entire legal system as well as a moral framework that assumes that humans are the responsible agents and the ones making decisions, and as soon as you start replacing that decision making with automated systems, you start to create significant problems for the regulation of these systems and for accountability and for justice. And then that leads directly to problems of safety and control, and what kinds of systems are gonna be fielded, what are gonna be the implications of that for international stability, who’s gonna have access to that, what are the implications for civilians and civilian infrastructures that might be targeted by these systems.

Ariel: I had wanted to go into some of this legality and liability stuff that you’ve brought up and you sort of given a nice overview of it as it is, but I was hoping you could expand a little bit on how this becomes a liability issue, and also … This is probably sort of an obvious question, but if you could touch a little on just how complicated it is to change the laws so that they would apply to autonomous systems as opposed to humans.

Peter: A lot of the work I’ve been doing under a grant for the Future of Life Institute, looks at liability in increasingly autonomous systems. I know within civilian domestic application, of course the big application that everybody’s looking at at the moment is the self-driving car, so you can ask this question, who’s responsible when the self-driving car creates an accident. And the way that liability law works, of course somebody somewhere is always going to wind up being responsible. The law will find a way to hold somebody responsible. The question is whether existing precedence and the ways of doing things under current legal frameworks is really just or is really the best way going forward as we have these kinds of increasingly autonomous systems.

So, in terms of holding persons responsible and liable, so under tort law, if you have an accident, then you can sue somebody. This isn’t criminal law, this is the law of torts, and under that, then you sort of receive monetary compensation for damages done. But ideally, the person, or agents, or company or what have you that causes the harm is the one that should pay. Of course, that’s not always true, and the way that liability works, does things like joint and several liability in which, even though one party only had a small hand in causing a harm, they may have lots of money, like a government or a state, or a city, or something like that, and so they may actually wind up paying far more as a share of damages than they actually contributed to a problem.

You also have situations of strict liability such that even if your agency in causing a problem was very limited, you can still be held fully responsible for the implications. There’s some interesting parallels here with the keeping of animals, which are kind of autonomous systems in a sense. They have their minds of their own, they sort of do things. On the other hand, we expect them to be well behaved and well trained, at least for domestic animals. So generally speaking, you have liability for harms caused by your dog or your horse and so forth as a domesticated animal, but you don’t have strict liability. So, you actually have to show that maybe you’ve trained your dog to attack or you’ve failed to properly train your horse or keep in a stable or what have you, whereas if you keep a tiger or something like that and it gets out and causes harm, then you’re strictly liable.

So the question is for a robot, should you be strictly liable for the robots that you create or the robots that you own? Should corporations that manufacture these systems be strictly liable for all of the accidents of self-driving cars? And while that seems like a good policy from the perspective of the public, because all the harms that are caused by these systems will be compensated, that could also stifle innovation. In the car sector, that doesn’t seem to be a problem. As it turns out, the president of Volvo said that they will accept strict liability for all of their self-driving cars. Tesla Motors has released a number of autopilot systems for their cars and more or less accepted the liability for that, although there’s only been a few accidents, so the actual jurisprudence or case law is still really emerging around that.

But those are, I think, a technology where the cars are very expensive, there’s a lot of money to be made in self-driving cars, and so the expectation of the car companies is that there will be very few accidents and that they can really afford to pay the damages for all those accidents. Now, is that gonna be true for personal robots? So, if you have a personal assistant, sort of butler robot who maybe goes on shopping errands and things like that for you, there’s a potential for them to cause significant economic damage. They’re probably not gonna be nearly as expensive as cars, hopefully, and it’s not clear that the market for them is going to be as big, and it’s not clear that companies would be able to absorb the cost of strict liability. So, there’s a question of whether that’s really the best policy for those kinds of systems.

Then there’s also questions of ability of people to modify their systems, so if you’re holding companies strictly responsible for their products, then those companies are not going to allow consumers to modify those products in any way, because that would affect their ability to control them. If you want a kind of DIY culture around autonomous systems of robotics, then you’re gonna see a lot of people modifying these systems, reprogramming these systems. So you also want, I think, a kind of strict liability around anybody who does those kinds of modifications rather than the manufacturer, and that’s to sort of break the seal and you accept all the responsibility for what happens.

And I think that’s sort of one side of it now and the military side of it, you don’t really have torts in the same way. There’s of course a couple of extreme issues around torts in war, but generally speaking, militaries do not pay monetary damages when they make mistakes. If they accidentally blow up the wrong building, they don’t pay to build a new building. That’s just considered a casualty of war and an accident, and it’s not even necessarily a war crime or anything else, because you don’t have these kind of mechanisms where you can sue an invading army for dropping a bomb in the wrong place.

The idea that liability is going to act as an accountability measure on autonomous system is just silly, I think, in warfare, because you just, you can’t sue people in war, basically. There’s a few exceptions and the governments that purchase weapons systems can sue the manufacturers, and that’s the sort of sense in which there is an ability to do that, but even most of those cases have been largely unsuccessful. Generally, those kinds of lawsuits are based on contracts and not the actual performance or damages caused by an actual system. So, you don’t really have that entire regulatory mechanism, so if you have a government that’s concerned about not harming civilians and not bombing the wrong buildings and things like that, of course, then they’re incentivized to put pressure on manufacturers to build systems that perform well, and that’s one of the sort of drivers of that technology.

But it’s a much weaker force if you think about what the engineers in a car company are thinking about in terms of safety and the kind of bottom line for their company if they make a product that causes accidents versus how that’s thought about in a defense company, where certainly they’re trying to protect civilians and ensure that systems work correctly, but they don’t have that enormously powerful economic concern about lawsuits in the future. The idea that the technology is going to be driven by similar forces, it doesn’t really apply. So that’s a big concern, I think, for the development of autonomous systems in the military sphere.

Ariel: Is there a worry or a risk that this sort of — I don’t know if it’s lack of liability, maybe it’s just whether or not we can trust the systems that are being built — but is there an increased risk of war crimes as a result of autonomous weapons, either intentionally or accidentally?

Peter: Yeah, I mean, the idea that there’s an increased risk of war crimes is kind of an interesting question, because the answer is simultaneously yes and no. What these autonomous systems actually do is diminish or remove, or put a distance between accountability of humans and their actions, or the consequences of their actions. So if you think of the autonomous system as a sort of intermediary between humans and the effects of their actions, there’s this sort of accountability gap that gets created. A system could go and do some horrendous act, like devastate a village and all the civilians in the village, and then we say, “Ah, is this a war crime?” And under international law as it stands, you’d have to prove intention, which is usually the most difficult part of war crimes tribunals, being able to actually demonstrate in court that a commander had the intention of committing some genocidal act or some war crime.

And you can build various forms of evidence for that. Now, if you send out an autonomous system, and you may not even know what that system is really gonna do and you don’t need to know exactly what it’s going to do when you give its orders, it becomes very easy to sort of distance yourself legally from what that system does in the field. Maybe you suspect it might do something terrible, and that’s what you really want, but it would be very easy then to sort of cover up your true intentions using these kinds of systems.

On the one hand, it would be much easier to commit war crimes. On the other hand, it’ll be much more difficult to prosecute or hold anybody accountable for war crimes that would be committed by autonomous weapons.

Ariel: You’ve also been producing some open letters this summer. There was one for academics calling on Google to stop work on Project Maven and … I’m sorry, you had another one… what was that one about?

Peter: The Amazon face recognition.

Ariel: Right. Right. Yeah. I was hoping you could talk a little bit about what you see as the role of academics and corporations and civil society in general in this debate about lethal autonomous weapons.

Peter: I think in terms of the debate of lethal autonomous weapons, civil society has a crucial role to play. I think in a broad range of humanitarian disarmament issues, and in the case of autonomous weapons, it’s really, it’s a technology that’s moving very quickly, and militaries are still a little bit unsure of exactly how they’re going to use it, but they’re very excited about it and they’re putting lots of research investment into new applications and trying to find new ways of using it. And I think that’s exciting from a research perspective, but it’s very concerning from a humanitarian and human rights perspective, because again, it’s not clear what kind of legal accountability will be around these systems. It’s not clear what kind of safety, control, and testing might be imposed on these systems, and it also seems quite clear that these systems are ready made for arms races and global and regional military destabilizations, where competitors are acquiring these systems and that has a potential to lead to conflict because of that destabilization itself. Then of course, the rapid proliferation.

So, in terms of civil society’s role, I think what we’ve been doing primarily is voicing of the general concern, I think, of the broad public globally and within specific countries that we’ve surveyed are largely opposed to these systems. Of course, the proponents say that’s just because they’ve seen too many sci fi movies and these things are gonna be just fine, but I don’t think that’s really the case. I think there’s some genuine fears and concerns that need to be addressed. So, we’ve also seen the involvement of a number of tech companies that are developing artificial intelligence, machine learning, robotics, and things like that.

And I think their interest and concern in this issue is twofold. We have companies like Clearpath Robotics, which is the largest robotics company in Canada, and also the largest supplier of robots to the Canadian military, whose engineers organized together to say that they do not want their systems to be used for autonomous weapons platforms, and they will not build them, but they also want to support the international campaign to ensure that governments don’t acquire their robots and then weaponize them. And they’re doing search and rescue robots and bomb disposal robots. This similar movement amongst academics and artificial intelligence and robotics who have spent really their life work developing these fundamental technologies who are then deeply concerned that the first and perhaps last application of this is going to be autonomous weapons, and the public will turn against artificial intelligence and robotics because of that, and then that these systems are genuinely scary and that we shouldn’t really be entrusting human lives or the decision to take human lives to these automated systems.

They have all kinds of great practical social applications and we should be pursuing those and just leave aside and really prohibit the use of these systems in the military context for autonomous targeting. And now I think we’re seeing more movement from the big companies, particularly this open letter that we’re a part of with Google, and their Project Maven. And Project Maven is a Pentagon project that aims at analyzing all the many thousands of hours of drone footage that the US military drones are collecting over Afghanistan and Iraq and various places where they’re operating. And to try to automate, using machine learning, to identify objects of interest, to kind of save time for human sensor analysts who have to pour through these images and then try to determine what that is.

And that in and of itself, that doesn’t seem too terrible, right? You’re just scanning through this imagery. But of course, this is really the first step to an automated targeted recognition system for drones, so if you wanted to fully automate drones, which currently require human operators to interpret the imagery to decide that this is something that should be targeted with a weapon and then to actually target and fire a weapon, that whole process is still controlled by humans. But if you wanted to automate it, the first thing you’d have to do is automate that visual analysis piece. So, Project Maven is trying to do exactly that, and to do that on a really big scale.

The other kind of issue from the perspective of a labor and research organization is that the Pentagon really has trouble, I think, attracting talent. There’s a really strong demand for artificial intelligence researchers and developers right now, because there’s so many applications and there’s so much business opportunity around it. It actually turns out the military opportunities are not nearly as lucrative as a lot of the other business applications. Google, and Amazon, and Facebook, and Microsoft can offer enormous salaries to people with PhDs in machine learning or even just masters degrees or some experience in systems development. And the Pentagon can’t compete with that on government salaries, and I think they’re even having trouble getting certain contracts with these companies. But when they get a contract with a company like Google, then they’re able to get access to really the top talent in artificial intelligence and their Cloud research groups and engineering, and also the sort of enormous capacity computationally of Google that has these massive data centers and processing capabilities.

And then you’re also getting … in some ways, Google is a company that collects data about people all over the world every day, all the time. Every Google search that you do, and there’s millions of Google searches per second or something in the world, so they have also the potential of applying the data that’s collected on the public in all these complicated ways. It’s really kind of a unique company in these respects. I think as a company that collects that kind of private data, they also have a certain obligation to society to ensure that that data isn’t used in detrimental ways, and siding with the single military in the world and using data that might be coming from users in countries where that military is operating, I think that’s deeply problematic.

We as academics kind of lined up with the engineers and researchers at Google who were already protesting Google’s involvement in this project. They were concerned about their involvement in the drone program. They were concerned about how this could be applied to autonomous weapons systems in the future. And they were just generally concerned with Google’s attempts to become a major military contractor and not just selling a simple service, like a word processor or a search, which they do anyway, but actually developing customized systems to do military operations, analyze these systems and apply their engineering skills and resources to that.

So, we really joined together as academics to support those workers. The workers passed around an open letter and then we passed around our letter, so the Google employees letter received over 4000 signatures and our letter from academics received almost 1200, a few shy. So, we really got a lot of mobilization and awareness, and then Google agreed to not renew that contract. So, they’re not dropping it, they’re gonna continue it till the end of the year, but they have said that they will not renew it in the future.

Ariel: Is there anything else that you think is important to mention?

Peter: I wrote a piece last night for a report on human dignity. So, I can just give you a little blurb about human dignity. I think the other kind of interesting ethical question around autonomous systems is this question of the right to human dignity and whether autonomous weapons or allowing robots to kill people would violate human dignity. I think some people have a very simplistic notion of human dignity, that it’s just some sort of aura or something of property that hangs around people and can be violated, but in fact I believe human dignity is a relation between people and this is a more Kantian view that human dignity means that you’re respected by others as a human. Others respect your rights, which doesn’t mean they can never violate them, but they have to have reasons and justifications that are sound in order to override your rights.

And in the case of human dignity, of course you can die in many terrible ways on a battlefield, but the question is whether the decision to kill you is justified and if it’s not, then it’s sort of an arbitrary killing. That means there’s no reasons for it, and I think if you look at the writings of the Special Rapporteur on extrajudicial summary on arbitrary executions, he’s written some interesting papers on this, which is essentially that all killing by autonomous weapons would be arbitrary in this kind of legal sense, because these systems don’t have access to reasons for killing you to know that it’s actually justified to use lethal force in a given situation.

And that’s because they’re not reasoning in the same way that we are, but it’s also because they’re not human moral agents, and it’s important in a sense that they be human, because human dignity is something that we all lose when it’s violated. So, if you look at slavery or you look at torture, it’s not simply the person who’s being tortured or enslaved who is suffering, though of course they are, but it is in fact all of us who lose a certain value of human life and human dignity by the very existence of slavery or torture, and the acceptance of that.

In a similar way, if we accept the killing of humans by machines, then we’re really diminishing the nature of human dignity and the value of human life, in a broad sense that affects everybody, and I think that’s really true, and I think we really have to think about what it means to have human control over these systems to ensure that we’re not violating the rights and dignity of people when we’re engaged in armed conflict.

Ariel: Excellent. I think that was a nice addition. Thank you so much for taking the time to do this today.

We covered a lot of ground in these interviews, and yet we still only scratched the surface of what’s going on in the debate on lethal autonomous weapons. If you want to learn more, please visit and visit the research and reports page. On the FLI site, we’ve also addressed some of the common arguments we hear in favor of lethal autonomous weapons, and we explain why we don’t find those arguments convincing. And if you want to learn even more, of course there’s the Campaign to Stop Killer Robots website, ICRAC has a lot of useful information on their site, and Article 36 has good information, including their report on meaningful human control. And if you’re also concerned about a future with lethal autonomous weapons, please take a moment to sign the pledge. You can find links to the pledge and everything else we’ve talked about on the FLI page for this podcast.

I want to again thank Paul, Toby, Richard, Mary, Bonnie and Peter for taking the time to talk about their work with LAWS.

If you enjoyed this show, please take a moment to like it, share it and maybe even give it a good review. I’ll be back again at the end of next month discussing global AI policy. And don’t forget that Lucas Perry has a new podcast on AI value alignment, and a new episode from him will go live in the middle of the month.

Machine Reasoning and the Rise of Artificial General Intelligences: An Interview With Bart Selman

From Uber’s advanced computer vision system to Netflix’s innovative recommendation algorithm, machine learning technologies are nearly omnipresent in our society. They filter our emails, personalize our newsfeeds, update our GPS systems, and drive our personal assistants. However, despite the fact that such technologies are leading a revolution in artificial intelligence, some would contend that these machine learning systems aren’t truly intelligent.

The argument, in its most basic sense, centers on the fact that machine learning evolved from theories of pattern recognition and, as such, the capabilities of such systems generally extend to just one task and are centered on making predictions from existing data sets. AI researchers like Rodney Brooks, a former professor of Robotics at MIT, argue that true reasoning, and true intelligence, is several steps beyond these kinds of learning systems.

But if we already have machines that are proficient at learning through pattern recognition, how long will it be until we have machines that are capable of true reasoning, and how will AI evolve once it reaches this point?

Understanding the pace and path that artificial reasoning will follow over the coming decades is an important part of ensuring that AI is safe, and that it does not pose a threat to humanity; however, before it is possible to understand the feasibility of machine reasoning across different categories of cognition, and the path that artificial intelligences will likely follow as they continue their evolution, it is necessary to first define exactly what is meant by the term “reasoning.”


Understanding Intellect

Bart Selman is a professor of Computer Science at Cornell University. His research is dedicated to understanding the evolution of machine reasoning. According to his methodology, reasoning is described as taking pieces of information, combining them together, and using the fragments to draw logical conclusions or devise new information.

Sports provide a ready example of expounding what machine reasoning is really all about. When humans see soccer players on a field kicking a ball about, they can, with very little difficulty, ascertain that these individuals are soccer players. Today’s AI can also make this determination. However, humans can also see a person in a soccer outfit riding a bike down a city street, and they would still be able to infer that the person is a soccer player. Today’s AIs probably wouldn’t be able to make this connection.

This process— of taking information that is known, uniting it with background knowledge, and making inferences regarding information that is unknown or uncertain — is a reasoning process. To this end, Selman notes that machine reasoning is not about making predictions, it’s about using logical techniques (like the abductive process mentioned above) to answer a question or form an inference.

Since humans do not typically reason through pattern recognition and synthesis, but by using logical processes like induction, deduction, and abduction, Selman asserts that machine reasoning is a form of intelligence that is more like human intelligence. He continues by noting that the creation of machines that are endowed with more human-like reasoning processes, and breaking away from traditional pattern recognition approaches, is the key to making systems that not only predict outcomes but also understand and explain their solutions. However, Selman notes that making human-level AI is also the first step to attaining super-human levels of cognition.

And due to the existential threat this could pose to humanity, it is necessary to understand exactly how this evolution will unfold.


The Making of a (super)Mind

It may seem like truly intelligent AI are a problem for future generations. Yet, when it comes to machines, the consensus among AI experts is that rapid progress is already being made in machine reasoning. In fact, many researchers assert that human-level cognition will be achieved across a number of metrics in the next few decades. Yet, questions remain regarding how AI systems will advance once artificial general intelligence is realized. A key question is whether these advances can accelerate farther and scale-up to super-human intelligence.

This process is something that Selman has devoted his life to studying. Specifically, he researches the pace of AI scalability across different categories of cognition and the feasibility of super-human levels of cognition in machines.

Selman states that attempting to make blanket statements about when and how machines will surpass humans is a difficult task, as machine cognition is disjointed and does not draw a perfect parallel with human cognition. “In some ways, machines are far beyond what humans can do,” Selman explains, “for example, when it comes to certain areas in mathematics, machines can take billions of reasoning steps and see the truth of a statement in a fraction of a second. The human has no ability to do that kind of reasoning.”

However, when it comes to the kind of reasoning mentioned above, where meaning is derived from deductive or inductive processes that are based on the integration of new data, Selman says that computers are somewhat lacking. “In terms of the standard reasoning that humans are good at, they are not there yet,” he explains. Today’s systems are very good at some tasks, sometimes far better than humans, but only in a very narrow range of applications.

Given these variances, how can we determine how AI will evolve in various areas and understand how they will accelerate after general human level AI is achieved?

For his work, Selman relies on computational complexity theory, which has two primary functions. First, it can be used to characterize the efficiency of an algorithm used for solving instances of a problem. As Johns Hopkins’ Leslie Hall notes, “broadly stated, the computational complexity of an algorithm is a measure of how many steps the algorithm will require in the worst case for an instance of a given size.” Second, it is a method of classifying tasks (computational problems) according to their inherent difficulty. These two features provide us with a way of determining how artificial intelligences will likely evolve by offering a formal method of determining the easiest, and therefore most probable, areas of advancement. It also provides key insights into the speed of this scalability.

Ultimately, this work is important, as the abilities of our machines are fast-changing. As Selman notes, “The way that we measure the capabilities of programs that do reasoning is by looking at the number of facts that they can combine quickly. About 25 years ago, the best reasoning engines could combine approximately 200 or 300 facts and deduce new information from that. The current reasoning engines can combine millions of facts.” This exponential growth has great significance when it comes to the scale-up to human levels of machine reasoning.

As Selman explains, given the present abilities of our AI systems, it may seem like machines with true reasoning capabilities are still some ways off; however, thanks to the excessive rate of technological progress, we will likely start to see machines that have intellectual abilities that vastly outpace our own in rather short order. “Ten years from now, we’ll still find them very much lacking in understanding, but twenty or thirty years from now, machines will have likely built up the same knowledge that a young adult has,” Selman notes. Anticipating exactly when this transition will occur will help us better understand the actions that we should take, and the research that the current generation must invest in, in order to be prepared for this advancement.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

$2 Million Donated to Keep Artificial General Intelligence Beneficial and Robust

$2 million has been allocated to fund research that anticipates artificial general intelligence (AGI) and how it can be designed beneficially. The money was donated by Elon Musk to cover grants through the Future of Life Institute (FLI). Ten grants have been selected for funding.

Said Tegmark, “I’m optimistic that we can create an inspiring high-tech future with AI as long as we win the race between the growing power of AI and the wisdom with which the manage it. This research is to help develop that wisdom and increasing the likelihood that AGI will be best rather than worst thing to happen to humanity.”

Today’s artificial intelligence (AI) is still quite narrow. That is, it can only accomplish narrow sets of tasks, such as playing chess or Go, driving a car, performing an Internet search, or translating languages. While the AI systems that master each of these tasks can perform them at superhuman levels, they can’t learn a new, unrelated skill set (e.g. an AI system that can search the Internet can’t learn to play Go with only its search algorithms).

These AI systems lack that “general” ability that humans have to make connections between disparate activities and experiences and to apply knowledge to a variety of fields. However, a significant number of AI researchers agree that AI could achieve a more “general” intelligence in the coming decades. No one knows how AI that’s as smart or smarter than humans might impact our lives, whether it will prove to be beneficial or harmful, how we can design it safely, or even how to prepare society for advanced AI. And many researchers worry that the transition could occur quickly.

Anthony Aguirre, co-founder of FLI and physics professor at UC Santa Cruz, explains, “The breakthroughs necessary to have machine intelligences as flexible and powerful as our own may take 50 years. But with the major intellectual and financial resources now being directed at the problem it may take much less. If or when there is a breakthrough, what will that look like? Can we prepare? Can we design safety features now, and incorporate them into AI development, to ensure that powerful AI will continue to benefit society? Things may move very quickly and we need research in place to make sure they go well.”

Grant topics include: training multiple AIs to work together and learn from humans about how to coexist, training AI to understand individual human preferences, understanding what “general” actually means, incentivizing research groups to avoid a potentially dangerous AI race, and many more. As the request for proposals stated, “The focus of this RFP is on technical research or other projects enabling development of AI that is beneficial to society and robust in the sense that the benefits have some guarantees: our AI systems must do what we want them to do.”

FLI hopes that this round of grants will help ensure that AI remains beneficial as it becomes increasingly intelligent. The full list of FLI recipients and project titles includes:

Primary Investigator Project Title Amount Recommended Email
Allan Dafoe, Yale University Governance of AI Programme $276,000
Stefano Ermon, Stanford University Value Alignment and Multi-agent Inverse Reinforcement Learning $100,000
Owain Evans, Oxford University Factored Cognition: Amplifying Human Cognition for Safely Scalable AGI $225,000
The Anh Han, Teesside University Incentives for Safety Agreement Compliance in AI Race $224,747
Jose Hernandez-Orallo, University of Cambridge Paradigms of Artificial General Intelligence and Their Associated Risks $220,000
Marcus Hutter, Australian National University The Control Problem for Universal AI: A Formal Investigation $276,000
James Miller, Smith College Utility Functions: A Guide for Artificial General Intelligence Theorists $78,289
Dorsa Sadigh, Stanford University Safe Learning and Verification of Human-AI Systems $250,000
Peter Stone, University of Texas Ad hoc Teamwork and Moral Feedback as a Framework for Safe Robot Behavior $200,000
Josh Tenenbaum, MIT Reverse Engineering Fair Cooperation $150,000


Some of the grant recipients offered statements about why they’re excited about their new projects:

“The team here at the Governance of AI Program are excited to pursue this research with the support of FLI. We’ve identified a set of questions that we think are among the most important to tackle for securing robust governance of advanced AI, and strongly believe that with focused research and collaboration with others in this space, we can make productive headway on them.” -Allan Dafoe

“We are excited about this project because it provides a first unique and original opportunity to explicitly study the dynamics of safety-compliant behaviours within the ongoing AI research and development race, and hence potentially leading to model-based advice on how to timely regulate the present wave of developments and provide recommendations to policy makers and involved participants. It also provides an important opportunity to validate our prior results on the importance of commitments and other mechanisms of trust in inducing global pro-social behavior, thereby further promoting AI for the common good.” -The Ahn Han

“We are excited about the potentials of this project. Our goal is to learn models of humans’ preferences, which can help us build algorithms for AGIs that can safely and reliably interact and collaborate with people.” -Dorsa Sadigh

This is FLI’s second grant round. The first launch in 2015, and a comprehensive list of papers, articles and information from that grant round can be found here. Both grant rounds are part of the original $10 million that Elon Musk pledged to AI safety research.

FLI cofounder, Viktoriya Krakovna, also added: “Our previous grant round promoted research on a diverse set of topics in AI safety and supported over 40 papers. The next grant round is more narrowly focused on research in AGI safety and strategy, and I am looking forward to great work in this area from our new grantees.”

Learn more about these projects here.

About the LAWS Pledge

AI Companies, Researchers, Engineers, Scientists, Entrepreneurs, and Others Sign Pledge Promising Not to Develop Lethal Autonomous Weapons

Leading AI companies and researchers take concrete action against killer robots, vowing never to develop them.

Stockholm, Sweden (July 18, 2018) After years of voicing concerns, AI leaders have, for the first time, taken concrete action against lethal autonomous weapons, signing a pledge to neither participate in nor support the development, manufacture, trade, or use of lethal autonomous weapons.

The pledge has been signed to date by over 160 AI-related companies and organizations from 36 countries, and 2,400 individuals from 90 countries. Signatories of the pledge include Google DeepMind, University College London, the XPRIZE Foundation, ClearPath Robotics/OTTO Motors, the European Association for AI (EurAI), the Swedish AI Society (SAIS), Demis Hassabis, British MP Alex Sobel, Elon Musk, Stuart Russell, Yoshua Bengio, Anca Dragan, and Toby Walsh.

Max Tegmark, president of the Future of Life Institute (FLI) which organized the effort, announced the pledge on July 18 in Stockholm, Sweden during the annual International Joint Conference on Artificial Intelligence (IJCAI), which draws over 5,000 of the world’s leading AI researchers. SAIS and EurAI were also organizers of this year’s IJCAI.

Said Tegmark, “I’m excited to see AI leaders shifting from talk to action, implementing a policy that politicians have thus far failed to put into effect. AI has huge potential to help the world – if we stigmatize and prevent its abuse. AI weapons that autonomously decide to kill people are as disgusting and destabilizing as bioweapons, and should be dealt with in the same way.”

Lethal autonomous weapons systems (LAWS) are weapons that can identify, target, and kill a person, without a human “in-the-loop.” That is, no person makes the final decision to authorize lethal force: the decision and authorization about whether or not someone will die is left to the autonomous weapons system. (This does not include today’s drones, which are under human control. It also does not include autonomous systems that merely defend against other weapons, since “lethal” implies killing a human.)

The pledge begins with the statement:

“Artificial intelligence (AI) is poised to play an increasing role in military systems. There is an urgent opportunity and necessity for citizens, policymakers, and leaders to distinguish between acceptable and unacceptable uses of AI.”

Another key organizer of the pledge, Toby Walsh, Scientia Professor of Artificial Intelligence at the University of New South Wales in Sydney, points out the thorny ethical issues surrounding LAWS. He states:

“We cannot hand over the decision as to who lives and who dies to machines. They do not have the ethics to do so. I encourage you and your organizations to pledge to ensure that war does not become more terrible in this way.”

Ryan Gariepy, Founder and CTO of both Clearpath Robotics and OTTO Motors, has long been a strong opponent of lethal autonomous weapons. He says:

“Clearpath continues to believe that the proliferation of lethal autonomous weapon systems remains a clear and present danger to the citizens of every country in the world. No nation will be safe, no matter how powerful. Clearpath’s concerns are shared by a wide variety of other key autonomous systems companies and developers, and we hope that governments around the world decide to invest their time and effort into autonomous systems which make their populations healthier, safer, and more productive instead of systems whose sole use is the deployment of lethal force.”

In addition to the ethical questions associated with LAWS, many advocates of an international ban on LAWS are concerned that these weapons will be difficult to control – easier to hack, more likely to end up on the black market, and easier for bad actors to obtain –  which could become destabilizing for all countries, as illustrated in the FLI-released video “Slaughterbots”.

In December 2016, the Review Conference of the Convention on Conventional Weapons (CCW) began formal discussion regarding LAWS at the UN. By the most recent meeting in April, twenty-six countries had announced support for some type of ban, including China. And such a ban is not without precedent. Biological weapons, chemical weapons, and space weapons were also banned not only for ethical and humanitarian reasons, but also for the destabilizing threat they posed.

The next UN meeting on LAWS will be held in August, and signatories of the pledge hope this commitment will encourage lawmakers to develop a commitment at the level of an international agreement between countries. As the pledge states:

“We, the undersigned, call upon governments and government leaders to create a future with strong international norms, regulations and laws against lethal autonomous weapons. … We ask that technology companies and organizations, as well as leaders, policymakers, and other individuals, join us in this pledge.”


As seen in the press

AI Alignment Podcast: AI Safety, Possible Minds, and Simulated Worlds with Roman Yampolskiy

What role does cyber security play in AI alignment and safety? What is AI completeness? What is the space of mind design and what does it tell us about AI safety? How does the possibility of machine qualia fit into this space? Can we leak proof the singularity to ensure we are able to test AGI? And what is computational complexity theory anyway?

AI Safety, Possible Minds, and Simulated Worlds is the third podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Roman Yampolskiy, a Tenured Associate Professor in the department of Computer Engineering and Computer Science at the Speed School of Engineering, University of Louisville. Dr. Yampolskiy’s main areas of interest are AI Safety, Artificial Intelligence, Behavioral Biometrics, Cybersecurity, Digital Forensics, Games, Genetic Algorithms, and Pattern Recognition. He is an author of over 100 publications including multiple journal articles and books. 

Topics discussed in this episode include:

  • Cyber security applications to AI safety
  • Key concepts in Roman’s papers and books
  • Is AI alignment solvable?
  • The control problem
  • The ethics of and detecting qualia in machine intelligence
  • Machine ethics and it’s role or lack thereof  in AI safety
  • Simulated worlds and if detecting base reality is possible
  • AI safety publicity strategy
In this interview we discuss ideas contained in upcoming and current work of Roman Yampolskiy. You can find them here: Artificial Intelligence Safety and Security and Artificial Superintelligence: A Futuristic Approach You can find more of his work at his Google Scholar and/or university page and follow him on his Facebook or Twitter.  You can hear about this work in the podcast above or read the transcript below.

Lucas: Hey everyone, welcome back to the AI Alignment Podcast Series with the Future of Life Institute. I’m Lucas Perry and today, we’ll be speaking with Dr. Roman Yampolskiy. This is the third installment in this new AI Alignment Series. If you’re interested in inverse reinforcement learning or the possibility of astronomical future suffering being brought about by advanced AI systems, make sure to check out the first two podcasts in this series.

As always, if you find this podcast interesting or useful, make sure to subscribe or follow us on your preferred listening platform. Dr. Roman Yampolskiy is a tenured associate professor in the Department of Computer Science and Engineering at the Speed School of Engineering at the University of Louisville. He is the founding and current director of the Cybersecurity Lab and an author of many books including Artificial Superintelligence: A Futuristic Approach.

Dr. Yampolskiy’s main areas of interest are in AI safety, artificial intelligence, behavioral biometrics, cybersecurity, digital forensics, games, genetic algorithms and pattern recognition. Today, we cover key concepts in his papers and books surrounding AI safety and artificial intelligence superintelligence and AGI, his approach to AI alignment, how AI security fits into all this. We also explore our audience-submitted questions. This was a very enjoyable conversation and I hope you find it valuable. With that, I give you Dr. Roman Yampolskiy.

Thanks so much for coming on the podcast, Roman. It’s really a pleasure to have you here.

Roman: It’s my pleasure.

Lucas: I guess let’s jump into this. You can give us a little bit more information about your background, what you’re focusing on. Take us a little bit through the evolution of Roman Yampolskiy and the computer science and AI field.

Roman: Sure. I got my PhD in Computer Science and Engineering. My dissertation work was on behavioral biometrics. Typically, that’s applied to profiling human behavior, but I took it to the next level looking at nonhuman entities, bots, artificially intelligent systems trying to see if we can apply same techniques, same tools to detect bots, to prevent bots, to separate natural human behavior from artificial behaviors.

From there, I try to figure out, “Well, what’s the next step? As those artificial intelligence systems more capable, can we keep up? Can we still enforce some security on them?” That naturally led me to looking at much more capable systems and the whole issues with AGI and superintelligence.

Lucas: Okay. In terms of applying biometrics to AI systems or software or computers in general, what does that look like and what is the end goal there? What are the metrics of the computer that you’re measuring and to what end are they used and what information can they give you?

Roman: The good example I can give you is from my dissertation work again. I was very interested with poker at the time. The poker rooms online were still legal in US and completely infested with bots. I had a few running myself. I knew about the problem and I was trying to figure out ways to automatically detect that behavior. Figure out which bot is playing and prevent them from participating and draining resources. That’s one example where you just have some sort of computational resource and you want to prevent spam bots or anything like that from stealing them.

Lucas: Okay, this is cool. Before you’ve arrived at this AGI and superintelligence stuff, could you explain a little bit more about what you’ve been up to? It seems like you’ve done a lot in computer security. Could you unpack a little bit about that?

Roman: All right. I was doing a lot of very standard work relating to pattern recognition, neural networks, just what most people do in terms of work on AI recognizing digits and handwriting and things of that nature. I did a lot of work in biometrics, so recognizing not just different behaviors but face recognition, fingerprint recognition, any type of forensic analysis.

I do run Cybersecurity Lab here at the University of Louisville. My students typically work on more well recognized sub domains of security. With them, we did a lot of work in all those domains, forensics, cryptography, security.

Lucas: Okay. Do you feel that all the security research, how much of it do you think is important or critical to or feeds into ASI and AGI research? How much of it right now is actually applicable or is making interesting discoveries, which can inform ASI and AGI thinking?

Roman: I think it’s fundamental. That’s what I get most of my tools and ideas for working with intelligent systems. Basically, everything we learned in security is now applicable. This is just a different type of cyber infrastructure. We learned to defend computers, networks. Now, we are trying to defend intelligent systems both from insider threats and outside from the systems themselves. That’s a novel angle, but pretty much everything I did before is now directly applicable. So many people working in AI safety approach it from other disciplines, philosophy, economics, political science. A lot of them don’t have the tools to see it as a computer science problem.

Lucas: The security aspect of it certainly make sense. You’ve written on utility function security. If we’re to make value aligned systems, then it’s going to be important that the right sorts of people have control over them and that their preferences and dispositions and the systems, again, utility function is secure is very important. A system in the end I guess isn’t really safe or robust or value aligned if it’s extremely influenced by anyone.

Roman: Right. If someone can just disable your safety mechanism, do you really have a safe system? That completely defeats everything you did. You release a well-aligned, friendly system and then somebody flips a few bits and you got the exact opposite.

Lucas: Right. Given this research focus that you have in security and how it feeds into ASI and AGI thinking and research and AI alignment efforts, how would you just generally summarize your approach to AI alignment and safety?

Roman: There is not a general final conclusion I can give you. It’s still work in progress. I’m still trying to understand all the types of problems we are likely to face. I’m still trying to understand this problem as even solvable to begin with. Can we actually control more intelligent systems? I always look at it from engineering computer science point of view much less from philosophy ethics point of view.

Lucas: Whether or not this problem is in principle solvable, that has a lot to do with fundamental principles and ideas and facts about minds in general and what is possible of minds. Can you unpack a little bit more about what sorts of information we need or what we need to think about more going forward to know what it means whether or not this problem is solvable in principle, how we can figure that up as we continue forward?

Roman: There is multiple ways you can show that it’s solvable. The ideal situation is where you can produce some sort of a mathematical proof. That’s probably the hardest way to do it because it’s such a generic problem. It applies to all domains. It has to be still working under self-improvement and modification. It has to still work after learning of additional information and it has to be reliable against malevolent design, so purposeful modifications. It seems like it’s probably the hardest problem ever to be given to them. Mathematics community are willing to take it on.

You can also look at examples just from experimental situations both with artificial systems. Are we good at controlling existing AIs? Can we make them safe? Can we make software safe in general? Also, natural systems. Are we any good at creating safe humans? Are we good at controlling people? Now, it seems like after millennia of efforts coming up with legal framework, ethical framework, religions, all sorts of ways of controlling people, we are pretty much failing at creating safe humans.

Lucas: I guess in the end, that might come down to fundamental issues in human hardware and software. Like the reproduction of human beings through sex and the way that genetics functions just creates a ton of variance in each person, which each person has different dispositions and preferences and other things. Then also the way that I guess software is run and shared across culture and people. Creates more fundamental issues that we might not have in software and machines because they work differently.

Are there existence proofs I guess with AI where AI is superintelligent in a narrow domain or at least above human intelligence in a narrow domain and we have control over such narrow systems? Would it be potentially generalizable as you sort of aggregate more and more AI systems, which are superintelligent in narrow domains that as you aggregate that or create an AGI, which sort of has meta learning, we would be able to have control over it given these existence proofs in narrow domains?

Roman: There are certainly such examples in narrow domains. If we’re creating, for example, a system to play chess. We can have a single number measuring it’s performance. We can control whatever is getting better or worse. That’s quite possible and is very limited linear domain. The problem is as complexity increases, you go from this n-body problem equals one to n-body equals infinity, and that’s very hard to solve both computationally and in terms of just understanding what in that hyperspace of possibilities is a desirable outcome.

It’s not just gluing together a few narrow AIs like, “Okay, I have a chess playing program. I have a go playing program.” If I put them all in the same PC, do I now have general intelligence capable of moving knowledge across domains? Not exactly. Whatever safety you can prove for limited systems, not necessarily will transferred to a more complex system, which integrates the components.

Very frequently, then you add two safe systems, the merged system has back doors, has problems. Same with adding additional safety mechanisms. A lot of times, you will install a patch for software to increase security and the patch itself has additional loopholes.

Lucas: Right. It’s not necessarily the case that in the end, AGI is actually just going to be sort of like an aggregation of a lot of AI systems, which are superintelligent in narrow domains. Rather, it potentially will be something more like an agent, which has very strong meta learning. So, learning about learning and learning how to learn and just learning in general. Such that all the sort of process is in things that it learns or deeply integrated at a lower level and they’re sort of like a higher level thinking that is able to execute on these things that they learned. Is that so?

Roman: That makes a lot of sense.

Lucas: Okay. Moving forward here, it would be nice if we could go ahead and explore a little bit of the key concepts in your books and papers and maybe get into some discussions there. I don’t want to spend a lot of time talking about each of the terms and having you define them as people can read your book, Artificial Superintelligence: A Futuristic Approach. They can also check out your papers and you’ve talked about these in other places. I think it will be helpful for giving some background and terms that people might not exactly be exposed to.

Roman: Sure.

Lucas: Moving forward, what can you tell us about what AI completeness is?

Roman: It’s a somewhat fuzzy term kind of like Turing test. It’s not very precisely defined, but I think it’s very useful. It seems that there are certain problems in artificial intelligence in general which require you to pretty much have general intelligence to solve them. If you are capable of solving one of them, then by definition, we can reduce other problems to that one and solve all problems in AI. In my papers, I talk about passing Turing test as being the first such problem. If you can pass unrestricted version of a Turing test, you can pretty much do anything.

Lucas: Right. I think people have some confusions here about what intelligence is in the kinds of minds that can solve Turing tests completely and the architecture that they have and whether that architecture means they’re exactly intelligent. I guess some people have this kind of intuition or idea that you could have a sort of system that had meta learning and learning and was able to sort of think as a human does in order to execute a Turing test.

Then potentially, other people have an idea and this may be misguided where a sort of sufficiently complicated tree search or Google engine on the computer would be able to pass a Turing test and that seems potentially kind of stupid. Is the latter idea a myth? Or if not, how is it just as intelligent as the former?

Roman: To pass an unrestricted version of a Turing test, against someone who actually understands how AI works is not trivial. You can do it with just lookup tables and decision trees. I can give you an infinite number of completely novel situations where you have to be intelligent to extrapolate to figure out what’s going on. I think theoretically, you can think of an infinite lookup table which has every conceivable string for every conceivable previous sequence of questions, but in reality, it just makes no sense.

Lucas: Right. They’re going to be sort of like cognitive features and logical processes and things like inferences and extrapolation and logical tools that humans use that almost must necessarily come along for the ride in order to fully pass a Turing test.

Roman: Right. To fully pass it, you have to be exactly the same in your behavior as a human. Not only you have to be as smart, you also have to be as stupid. You have to repeat all the mistakes, all the limitations in terms of humanity, in terms of your ability to compute, in terms of your cognitive biases. A system has to be so smart that it has a perfect model of an average human and can fake that level of performance.

Lucas: It seems like in order to pass a Turing test, the system would either have to be an emulation of a person and therefore almost essentially be a person just on different substrate or would have to be superintelligent in order to run an emulation of a person or a simulation of a person.

Roman: It has to have a perfect understanding of an average human. It goes together with value alignment. You have to understand what a human would prefer or say or do in every situation and that does require you to understand humanity.

Lucas: Would that function successfully at a higher level of general heuristics about what an average person might do or does it require a perfect emulation or simulation of a person in order to fully understand what a person would do in such an instance?

Roman: I don’t know if it has to be perfect. I think there are certain things we can bypass and just going to read books about what a person would do in that situation, but you do have to have a model complete enough to produce good results in novel situations. It’s not enough to know, OK, most people would prefer ice cream over getting a beating, something like that. You have to figure out what to do in a completely novel set up where you can just look it up on Google.

Lucas: Moving on from AI completeness, what can you tell us about the space of mind designs and the human mental model and how this fits into AGI and ASI and why it’s important?

Roman: A lot of this work was started by Yudkowsky and other people. The idea is just to understand how infinite that hyperspace is. You can have completely different sets of goals and desires from systems which are very capable optimizers. They may be more capable than an average human or best human, but what they want could be completely arbitrary. You can’t make assumptions along the lines of, “Well, any system smart enough would be very nice and beneficial to us.” That’s just a mistake. If you randomly pick a mind from that infinite universe, you’ll end up with something completely weird. Most likely incompatible with human preferences.

Lucas: Right. This is just sort of, I guess, another way of explaining the orthogonality thesis as described by Nick Bostrom?

Roman: Exactly. Very good connection, but it gives you a visual representation. I have some nice figures where you can get a feel for it. You start with, “Okay, we have human minds, a little bit of animals, you have aliens in the distance,” but then you still keep going and going in some infinite set of mathematical possibilities.

Lucas: In this discussion of the space of all possible minds, it’s a discussion about intelligence where intelligence is sort of understood as the ability to change and understand the world and also the preferences and values which are carried along in such minds however random and arbitrary they are from the space of all possible mind design.

One thing which is potentially very important in my view is the connection of the space of all possible hedonic tones within mind space, so the space of all possible experience and how that maps onto the space of all possible minds. Not to say that there’s duality going on there, but it seems very crucial and essential to this project to also understand the sorts of experiences of joy and suffering that might come along for each mind within the space of all possible minds.

Is there a way of sort of thinking about this more and formalizing it more such as you do or does that require some more really foundational discoveries and improvements in the philosophy of mind or the science of mind and consciousness?

Roman: I look at this problem and I have some papers looking at those. One looks at just generation of all possible minds. Sequentially, you can represent each possible software program as an integer and brute force them. It will take infinite amount of time, but you’ll get to every one of them eventually.

Another recent paper looks at how we can actually detect qualia in natural and artificial agents. While it’s impossible for me to experience the world as someone else, I think I was able to come up with a way to detect whatever you have experiences or not. The idea is to present you with the illusions, kind of visual illusions and based on the type of body you have, the type of sensors you have, you might have experiences which match with mine. If they are not, then I can say really anything about you. You could be conscious and experiencing qualia or maybe not. I have no idea.

In a set of such tests on multiple illusions, you happen to experience exactly the same side effects from the illusion. This test drew multiple-choice questions and you can get any level of accuracy you want with just additional tests. Then I have no choice but to assume that you have exactly same qualia in their situation. So, at least I know you do have experiences of that type.

If it’s taking it to what you suggested pleasure or pain, we can figure out is there suffering going on, is there pleasure happening, but this is very new. We need a lot more people to start doing psychological experiments with that.

The good news is from existing literature, I found a number of experiments where a neutral network designed for something completely unrelated still experienced similar side effect as a natural model. That’s because the two models represent the same mathematical structure.

Lucas: Sorry. The idea here is that by observing effects on the system that if those effects are also correlated or seen in human subjects that this is potentially some indication that the qualia that is correlated with those effects in people is also potentially experienced or seen in the machine?

Roman: Kind of. Yeah. So, when I show you a new cool optical illusion. You experienced something outside of just the values of bits in that illusion. Maybe you see light coming out of it. Or maybe you see rotations. Maybe you see something else.

Lucas: I see a triangle that isn’t there.

Roman: Exactly. If a machine reports exactly the same experience without previous knowledge obviously, then just Google what a human would see. How else would you explain that knowledge, right?

Lucas: Yeah. I guess I’m not sure here. I probably need to think about it more actually, but this does seem like a very important approach in place to move forward. The person in me who’s concerned about thinking about ethics looks back on the history of ethics and thinks about how human beings are good at optimizing the world in ways in which it produces something of value to them but in optimizing for that thing, they produce huge amounts of suffering. We’ve done this through subjugation of women and through slavery and through factory farming of animals currently and previously.

After each of these periods, of these morally abhorrent behaviors, it seems we have an awakening and we’re like, “Oh, yeah, that was really bad. We shouldn’t have done that.” I guess just moving forward here with machine intelligence, it’s not clear that this will be the case or it is possible that it could be the case, but it may. Potentially sort of the next one of these moral catastrophes is if we sort of ignore this research into the possible hedonic states of machines and just brush it away as being dumb philosophical stuff that we potentially could produce an enormous amount of suffering in machine intelligence and just sort of override that and create another ethical catastrophe.

Roman: Right. I think that makes a lot of sense. I think qualia, a side effect of certain complex computations. You can’t avoid producing them if you’re doing this type of thinking, computing. We have to be careful once we get to that level of not having very painful side effects.

Lucas: Is there any possibility here of trying to isolate the neural architectural correlates of consciousness in human brains and then physically or digitally instantiating that in machines and then creating a sort of digital or physical corpus callosum between the mind of a person and such a digital or physical instantiation of some neural correlate of something in the machine in order to see if an integration of those two systems creates a change in qualia for the person? Such that the person could sort of almost first-person confirm that when it connects up to this thing that its subjective experience changes and therefore maybe we have some more reason to believe that this thing independent of the person, when they disconnect, has some sort of qualia to it.

Roman: That’s very interesting type of experiment I think. I think something like this has been done with Siamese twins conjoined with brain tissue. You can start looking at those to begin with.

Lucas: Cool. Moving on from the space of mind designs and human mental models, let’s go ahead and then talk about the singularity paradox. This is something that you cover quite a bit in your book. What can you tell us about the singularity paradox and what you think the best solutions are to it?

Roman: It’s just a name for this idea that you have a superintelligent system, very capable optimizer, but it has no common sense as we human perceive it. It’s just kind of this autistic savant capable of making huge changes in the world but a four-year-old would have more common sense in terms of disambiguation of human language orders. Just kind of understanding the desirable states of the world.

Lucas: This is sort of the fundamental problem of AI alignment. The sort of assumption about the kind of mind AGI or ASI will be, the sort of autistic savant sort of intelligence, what that is … This is what Dylan Hadfield-Menell brought up on our first podcast for the AI Alignment Series is that for this case of this autistic savant that most people have in mind, a perfectly rational Bayesian optimizing agent. Is that sort of the case? Is that the sort of mind that we have in mind when we’re thinking of this autistic savant that just blows over things we care about because it’s just optimizing too hard for one thing and Goodhardt’s law starts to come into effect?

Roman: Yes, in a way. I always try to find most simple examples so we can understand better in the real world. Then you have people with extremely high level of intelligence. The concerns they have, the issues they find interesting are very different from your average person. If you watch something like Big Bang Show with Sheldon, that’s like a good to funny example of this on a very small scale. There is maybe 30 IQ point difference, but what if it’s 300 points?

Lucas: Right. Given the sort of problem, what are your conclusions and best ideas or best practices for working on this? Working on this is just sort of working on the AI alignment problem I suppose.

Roman: AI alignment is just a new set of words to say we want the safe and secure system, which kind of does what we designed it to do. It doesn’t do anything dangerous. It doesn’t do something we disagree with. It’s well aligned with our intention. By itself, the term adds nothing new. The hard problem is, “Well, how do we do it?”

I think it’s fair to say that today, as of right now, no one in the world has a working safety mechanism capable of controlling intelligent behavior and scaling to a new level of intelligence. I think even worse is that no one has a prototype for such a system.

Lucas: One thing that we can do here is we can sort of work on AI safety and we can think about law, policy and governance to try and avoid an arms race in AGI or ASI. Then there are also important ethical questions which need to be worked on before AGI some of which including kind of more short-term things, universal basic income and bias and discrimination in algorithmic systems. How AI will impact the workforce and other things and potentially some bigger ethical questions we might have to solve after AGI if we can pull the brakes.

In terms of the technical stuff, one important path here is thinking about and solving the confinement problem, the method by which we are able to create an AGI or ASI and air gap it and make it so that it is confined and contained to be tested in some sort of environment to see if it’s safe. What are your views on that and what do you view as a potential solution to the confinement problem?

Roman: That’s obviously a very useful tool to have, to test, to debug, to experiment with an AI system while it’s limited in its communication ability. It cannot perform social engineering attacks against the designer or anyone else. It’s not the final solution if you will if a system can still escape from such confinement, but it’s definitely useful to be able to do experiments on evolving learning AI.

Can I limit access to the Internet? Can I limit access to knowledge, encyclopedia articles? Can I limit output in terms of just text, no audio, no video? Can I do just a binary yes or no? All of it is extremely useful. We have special air gap systems for studying computer viruses, so to understand how they work, how they communicate versus just taking it to the next level of malevolent software.

Lucas: Right. There’s sort of this, I guess, general view and I think that Eliezer has participated in some of these black boxing experiments where you pretend as if you are the ASI and you’re trying to get out of the box and you practice with other people to see if you can get out of the box. Out of discussions and thinking on this, it seems that some people thought that it’s almost impossible to confine these systems. Do you think that, that’s misguided or what are your views on that?

Roman: I agree that long-term, you absolutely cannot confine a more intelligent system. I think short-term while it’s still developing and learning, it’s a useful tool to have. The experiments Eliezer did, very novel at the time, but I wish he meet public all the information to make them truly scientific experiments where people can reproduce them properly, learn from them. Simply saying that this guy who now works with me let me out, it’s not the optimal way to do it.

Lucas: Right. I guess the concern there is with confinement experiments is that explaining the way in which it gets out is potentially an information hazard.

Roman: Yeah. People tend to call a lot of things informational hazards. Those things certainly exist. If you have source code for AGI, I strongly recommend you don’t make it public, but we’ve been calling a lot of things informational hazard I think.

The best example is Roko’s basilisk where essentially it was a new way to introduce Christianity. If I tell you about Jesus and you don’t follow him, now you’re going to hell. If I didn’t tell you about Jesus, you’d be much better off. Why did you tell me? Deleting it just makes it grow bigger and it’s like Streisand effect, right? You promoting this while you trying to suppress it. I think you have to be very careful in calling something an informational hazard, because you’re diluting the label by doing that.

Lucas: Here’s something I think we can potentially get into the weeds on and we may disagree about and have some different views on. Would you like to just go ahead and unpack your belief? First of all, go ahead and explain what it is and then explain your belief about why machine ethics in the end is the wrong approach or a wrong instrument in AI alignment.

Roman: The way it was always done in philosophy typically, everyone tried to publish a paper suggesting, “Okay, this is a set of ethics we need to follow.” Maybe it’s ethics based on Christianity or Judaism. Maybe it’s utilitarianism, whatever it is. There was never any actual solution, anything was proposed which could be implemented as a way to get everyone on board and agree with it. It was really just a competition for like, “Okay, I can come up with a new ethical set of constraints or rules or suggestions.”

We know philosophers have been trying to resolve it for millennia. They failed miserably. Why somehow moving it from humans to machines will make it easier problem to solve where a single machine is a lot more powerful and can do a lot more with this is not obvious to me. I think we’re unlikely to succeed by doing that. The theories are contradictory, ill-defined, they compete. It doesn’t seem like it’s going to get us anywhere.

Lucas: To continue unpacking your view a bit more, instead of machine ethics where we can understand machine ethics as the instantiation of normative and meta-ethical principles and reasoning and machine systems to sort of make them moral agents and moral reasoners, your view is that instead of using that, we should use safety engineering. Would you like to just unpack what that is?

Roman: To return to the definition you proposed. For every ethical system, there are edge cases which backfire tremendously. You can have an AI which is a meta-ethical decider and it figures out, “Okay, the best way to avoid human suffering is do not have any humans around.” You can defend it from philosophical point of view, right? It makes sense, but is that a solution we would accept if a much smarter system came up with it?

Lucas: No, but that’s just value misalignment I think. I don’t think that there are any sort of like … There are, in principle, possible moral systems where you say suffering is so bad that we shouldn’t risk any of it at all ever, therefore life shouldn’t exist.

Roman: Right, but then you make AI the moral agent. That means it’s making moral decisions. It’s not just copying what humans decided even if we can somehow figure out what the average is, it’s making its own novel decisions using its superintelligence. It’s very likely it will come up with something none of us ever considered. The question is, will we like it?

Lucas: Right. I guess just for me here, I understand why AI safety engineering and technical alignment efforts are so very important and intrinsic. I think that it really constitutes a lot of the AI alignment problem. I think that given that the universe has billions and billions and billions of years left to live, that the instantiation of machine ethics in AGI and ASI is… you can’t hold off on it and it must be done.

You can’t just have an autistic savant superspecies on the planet that you just never imbue with any sort of ethical epistemology or meta-ethics because you’re afraid of what might happen. You might want to do that extremely slowly and extremely carefully, but it seems like machine ethics is ultimately an inevitability. If you start to get edge cases that the human beings really don’t like, then potentially you just went wrong somewhere in cultivating and creating its moral epistemology.

Roman: I agree with doing it very slowly and carefully. That seems like a good idea in general, but again, just projecting to long-term possibilities. I’m not optimistic that the result will be beneficial.

Lucas: Okay. What is there left to it? If we think of the three cornerstones of AI alignment as being law, policy, governance, then we have ethics on one corner and then we have technical AI alignment on the other corner. We have these three corners.

If we have say AGI or ASI around 2050, which I believe is something a lot of researchers give a 50% probability to, then imagine we simply solve technical AI alignment and we solved the law, policy and governance coordination stuff so that we don’t end up having an arms race and we mess up on technical alignment. Or someone uses some singleton ASI to malevolently control everyone else.

Then we still have the ethical issues in the end. Even if we have a perfectly corrigible and docile intelligence, which is sort of tuned to the right people and sort of just takes the right orders. Then whatever that ASI does, it’s still going to be a manifestation, an embodiment of the ethics of the people who tell it what to do.

There’s still going to be billions and billions of years left in the universe. William MacAskill discusses this. Is that sort of after we’ve solved the technical alignment issues and the legal and political and coordination issues, then we’re going to need a period of long deliberation where we actually have to make concrete decisions about moral epistemology and meta-ethics and try and do it in really a formalized and rigorous way and potentially take thousands of years to figure it out.

Roman: I’m criticizing this and that makes it sound like I have a solution, which is something else and I don’t. I don’t have a solution whatsoever. I just feel it’s important to point out problems with each specific approach so we can avoid problems of over committing to it.

You mentioned a few things. You mentioned getting information from the right people. That seems like that’s going to create some problems right there. Not sure who the right people are. You mentioned spending thousands of years deciding what we want to do with this superintelligent system. I don’t know if we have that much time given all the other existential risks, given the chance of malevolent superintelligence being released by rogue agents much sooner. Again, it may be the best we got, but it seems like there are some issues we have to look at.

Lucas: Yeah, for sure. Ethics has traditionally been very messy and difficult. I think a lot of people are confused about the subject. Based on my conversation with Dylan Hadfield-Menell, when we’re discussing inverse reinforcement learning and other things that he was working on, his sort of view was a view of AI alignment and value alignment where inverse reinforcement learning and other preference learning techniques are sort of used to create a natural evolution of human values and preferences in ethics, which sort of exists in an ecosystem of AI systems which are all, I guess, in conversation so that it could, more so, naturally evolve.

Roman: Natural evolution is a brutal process. It really has no humanity to it. It exterminates most species. I don’t know if that’s the approach we want to simulate.

Lucas: Not an evolution of ideas?

Roman: Again, if those ideas are actually implemented and applied to all of humanity that has a very different impact than if it’s just philosophers debating with no impact.

Lucas: In the end, it seems like a very difficult end frontier to sort of think about and move forward on. Figuring out what we want and what we should do with a plurality of values and preferences. Whether or not we should take a view of moral realism or moral relativism or anti-realism about ethics and morality. Those seem like extremely consequential views or positions to take when determining the fate of the cosmic endowment.

Roman: I agree completely on how difficult the problem is.

Lucas: Moving on from machine ethics, you wrote a  paper on leak proofing the singularity. Would you like to go ahead and unpack a little bit about what you’re doing in the paper and how that ties into all of this?

Roman: That’s just AI boxing. That was the response to David Chalmers’ paper and he talks about AI boxing as leak proofing, so that’s the title we used, but it’s just a formalization of the whole process. Formalization of the communication channel, what goes in, what goes out. It’s a pretty good paper on it. Again, it relies in this approach of using tools from cyber security to formalize the whole process.

For a long time, experts in cyber security attempted to constrain regular software, not intelligent software from communicating with our programs and outside world and operating system. We’re looking at how that was done, what different classifications they used for site channels and so on.

Lucas: One thing that you also touch on, would you like to go ahead and unpack like wireheading addiction and mental illness in general in machine systems and AI?

Roman: It seems like there is a lot of mental disorders, people experience. The only example of general intelligence we have. More and more, we see similar problems show up in artificial systems, which try to emulate this type of intelligence. It’s not surprising and I think it’s good that we have this body of knowledge from psychology which we can now use to predict likely problems and maybe come up with some solutions for them.

Wireheading is essentially this idea of agent not doing any useful work but just stealing their work channel. If you think about having kids and there is a cookie jar and they get rewarded every time they clean the room or something like that with a cookie, well, they essentially can just find the cookie jar and get direct access to their work channel, right? They’re kids, so they’re unlikely to cause much harm, but if a system is more capable, it realizes you as a human control the cookie jar, well now, it has incentive to control you.

Lucas: Right. There are also these examples with rats and mice that you might be able to discuss a little bit more.

Roman: The classic experiments on that just created through surgery, electrode implants in a brain of some simple animals. Every time you provided an electrical shock to that area, the animals experience the maximum pleasure like orgasm you don’t get tired of. They bypass getting food, having sex, playing with toys. They just sat there pressing the button. If you made it where they have to walk on electrocuted fence to get to the button, it wasn’t a problem, they would do that. It completely messes with usefulness of an agent.

Lucas: Right. I guess just in terms of touching on the differences and the implications of ethics here is that one with sort of consequentialist views, which was sort of very impartial and on speciesists can potentially view wireheading as ethical or the end goal. Whereas other people view a wireheading as basically abhorrent and akin to something terrible that you would never want to happen. There’s also again, I think, a very interesting ethical tension there.

Roman: It goes, I think, to the whole idea of simulated reality and virtual world. Do you care if you’re only succeeding in a made-up world? Would that make you happy enough or do you have to actually impact reality? That could be part of resolving our differences about values and ethics. If every single person can be in their own simulated universe where everything goes according to their wishes, is that a solution to getting us all to agree? You know it’s a fake universe, but at least you’re the king in it.

Lucas: I guess that also touches on this question of the duality that human beings have created between what is fake and real. In what sense is something really fake if it’s not just the base reality? Is there really fundamental value in the thing being the base reality and do we even live in the base reality? How does cosmology or ideas that Max Tegmark explores about the multiverse sort of even impact that? How will that impact our meta-ethics and decision-making about the moral worth of wireheading and simulated worlds?

Roman: Absolutely. I have a paper on something I call designer metry, which is measuring natural versus artificial. The big question of course is can we tell if you are living in a simulated reality? Can it be measured scientifically? Or was it just a philosophical idea? It seems like there are certain ways to identify signals from the engineer if it’s done on purpose, but in general case, you can never tell whatever something is a deep fake or a real input.

Lucas: I’d like to discuss that a little bit more with you, but just to backup really quick to finish talking on about psychology and AI. It seems like this has been something that is really growing in the AI community and it’s not something that I really know much about at all. My general understanding is as AI systems become more and more complex, it’s going to be much more difficult to diagnose and understand the specific pathways and architectures, which are leading to mental illness.

Therefore, general diagnosable tools which observe and understand higher level phenomena or behaviors that systems exist that we’ve developed in psychology would be helpful or implementable here. Is that sort of the case and the use case of psychology here is really just diagnose mental illnesses or does it also has a role in developing positive psychology and well-being in machine systems?

Roman: I think it’s more of a first case. If you have a black box AI, just a huge, very deep neural network, you can just look at the wiring and weights and figure out why it’s producing the results you’re seeing. Whereas you can do high-level experiments, maybe even conversation with the system to give you an idea of how it’s misfiring what the problem is.

Lucas: Eventually, if we begin exploring the computational structure of different hedonic tones and that becomes more formalized as a science, then I don’t know, maybe potentially, there would be more of a role for psychologists in discussing the well-being part rather than the computational mental illness part.

Roman: It is a very new concept. It’s been mentioned a lot in science fiction, but as a scientific concept, it’s very new. I think there is only one or two papers on it directly. I think there is so much potential to exploring more on connections with neuroscience. I’m actually quite excited about it.

Lucas: That’s exciting. Are we living in a simulated world? What does it mean to be able to gather evidence about whether or not we’re living in a simulation? What would such evidence look like? Why may we or may not ever be able to tell whether or not we are in a simulation?

Roman: In general case, if there is not an intent to let you know that it’s a simulated world, you would never be able to tell. Absolutely anything can actually be part of natural base system. You don’t know what it’s like if you are Mario playing in an 8-bit world. You have no idea that it’s low resolution. You’re just part of that universe. You assume the base is the same.

There are situations where engineers leave trademarks, watermarks, helpful messages in a system to let you know what’s going on, but that’s just giving you the answer. I think in general case, you can never know, but from statistical arguments, there’s … Nick Bostrom presents a very compelling statistical arguments. I do the same for biological systems in one of my papers.

Roman: It seems more likely that we are not the base just because every single intelligent civilization will produce so many derived civilizations from it. From space exploration, from creating biological robots capable of undergoing evolutionary process. It would be almost a miracle if out of thousands and thousands of potential newly designed organisms, newly evolved ones, we were like the first one.

Lucas: I think that, that sort of evolutionary process presumes that the utility function of the optimization process, which is spreading into the universe, is undergoing an evolutionary process where it’s changing. Whereas the security and brittleness and stability of that optimization process might be very fixed. It might be that all future and possible super advanced civilizations do not converge on creating ancestor simulations.

Roman: It’s possible, but it feels like a bit less likely. I think they’ll still try to grab the resources and the systems may be fixed in certain values, but they still would be adopting to the local environment. We just see it with different human populations, right? We’re essentially identical, but we developed very different cultures, religions, food preferences based on the local available resources.

Lucas: I don’t know. I feel like I could imagine like a civilization, a very advanced one coming down on some sort of hedonic consequentialism where the view is that you just want to create as many beautiful experiences as possible. Therefore, there wouldn’t be any room for simulating evolution on Earth and all the suffering and kind of horrible things we have to go through.

Roman: But you’re looking at it from inside the simulation. You don’t know what the reasons are on the outside, so this is like a video game or going to the gym. Why would anyone be killed in a video game or suffer tremendously, lifting heavy weights in a gym, right? It’s only fun when you understand external reasons for it.

Lucas: I guess just two things here. I just have general questions on. If there is a multiverse at one or another level, would it then also be the case that the infinity of simulated universes would be a larger fraction of the infinity of the multiverse than the worlds which were not simulated universes?

Roman: This is probably above my pay grade. I think Max is someone who can give you a better answer in that. Comparing degrees of infinities is hard.

Lucas: Okay. Cool. It is not something I really understand either. Then I guess the other thing is I guess just in general, it seems queer to me that human beings are in a world and that we look at our computer systems and then we extrapolate what if these computer systems were implemented at a more base level. It seems like we’re trapped in a context where all that we have to extrapolate about the causes and conditions of our universe are the most fundamental things that we can observe from within our own universe.

It seems like settling on the idea of, “Okay, we’re probably in a simulation,” just seems kind of like we’re gluing to and finding a cosmogenesis hope in one of the only few things that we can, just given that we live in a universe where there are computers. Does that make sense?

Roman: It does. Again, from inside the simulation, you are very limited in understanding the big picture. Then so much would be easier to understand if we had external knowledge, but it’s just not the option we have so far. We learn by pretending to be the engineer in question and now we design virtual worlds. We design intelligent beings and the options we have is the best clue we have about the options available to whoever does it in the external level.

Lucas: Almost as if Mario got to the end of the level and got to the castle. Then because you got to the castle the next level or world started, he was like maybe outside of this context there’s just a really, really big castle or something that’s making lower levels of castles exist.

Roman: Right. I agree with that, but I think we have in common this mathematical language. I think that’s still universal. Just by studying mathematics and possible structures and proving things, we can learn about what’s possible and impossible.

Lucas: Right. I mean there’s just really foundational and fundamental question about the metaphysical realism or anti-realism of mathematics. If there is a multiverse or like a meta multiverse or like a meta-meta-meta-multiverse levels …

Roman: Only three levels.

Lucas: I guess just the implications of a mathematical realism or Platonism or sort of anti-realism at these levels would have really big implications.

Roman: Absolutely, but at this point, I think it’s just fun to think about those possibilities and what they imply for what we’re doing, what we’re hoping to do, what we can do. I don’t think it’s a waste of time to consider those things.

Lucas: Just generally, this is just something I haven’t really been updated on. Is this rule about only in three levels of regression, is that just sort of a general principle or role kind of like Occam’s razor that people like to stick by? Or is there any more…?

Roman: No. I think it’s something Yudkowsky said and it’s cute and kind of meme like.

Lucas: Okay. So it’s not like serious epistemology?

Roman: I don’t know how well proven that is. I think he spoke about levels of recursion initially. I think it’s more of a meme.

Lucas: Okay. All right.

Roman: I might be wrong in that. I know a lot about memes, less about science.

Lucas: Me too. Cool. Given all this and everything we’ve discussed here about AI alignment and superintelligence, what are your biggest open questions right now? What are you most uncertain about? What are you most looking for key answers on?

Roman: The fundamental question of AI safety, is it solvable? Is control problem solvable? I have not seen a paper where someone gives mathematical proof or even a rigorous argument. I see in some blog posts arguing, “Okay, we can predict what the chess machine will do, so surely we can control superintelligence,” but it just doesn’t seem like it’s enough. I’m working on a paper where I will do my best to figure out some answers for that.

Lucas: what is the definition of control and AI alignment?

Roman: I guess it’s very important to formalize those before you can answer the question. If we don’t even know what we’re trying to do, how can we possibly succeed? The first step in any computer science research project is to show that your problem is actually solvable. Some are not. We know, for example, holding problem is not solvable, so it doesn’t make sense to give it as an assignment to someone and wait for them to solve it. If you give them more funding, more resources, it’s just a waste.

Here, it seems like we have more and more people working very hard in different solutions, different methods, but can we first spend a little bit of time seeing how successful can we be? Based on the answer to that question, I think a lot of our governance and the legal framework and general decision-making about this domain will be impacted by it.

Lucas: If your core and key question here is whether or not the control problem or AI alignment is, in principle, or fundamentally solvable, could you give us a quick crash course on complexity theory and computational complexity theory and just things which take polynomial time to solve versus exponential time?

Roman: That’s probably the hardest course you’ll take as an undergraduate in computer science. At the time, I hated every second of it. Now, it’s my favorite subject. I love it. This is the only professor whom I remember teaching computational complexity and computability.

To simplify it, there are different types of problems. Surprisingly, almost all problems can be squeezed into one of those boxes. There are easy problems, which we can just quickly compute. Your calculator adding 2+2 is an example of that. There are problems where we know exactly how to solve them. It’s very simple algorithm. We can call it brute force. You try every option and you’ll always get the best answer, but there’s so many possibilities that in reality you can never consider every option.

Lucas: Like computing prime numbers.

Roman: Well, computer numbers are NP. It’s polynomial to test if a number is prime. It’s actually one of somewhat recent paper for the last 10 years, a great result, Ps are N prime. There are problems which are called NP complete and those are usually the interesting problems we care about and they all reduce to each other. If you solve one, you solved all of them. You cannot brute force them. You have to find some clever heuristics to get approximate answers, optimize those.

We can get pretty close to that. Examples like traveling salesperson problem. If you can figure out optimal way to deliver pizza to multiple households, if you can solve it in general case, you’ll solve 99% of interesting problems. Then there are some problems which we know no one can ever solve using Von Neumann architecture, like standard computer architecture. There are proposals for hyper computation computers with oracles, computers with all sorts of magical properties which would allow us to solve those very, very, very difficult problems, but that doesn’t seem likely anytime soon.

The best part of it I think is this idea of oracles. An oracle is a machine capable of doing magic to give you answer to otherwise unsolvable problem, and there are degrees of oracles. There are magical machines, which are more powerful magicians than the magical machine. None of it is working in practice. It’s all purely theoretical. You start learning about different degrees of magic and it’s pretty cool.

Lucas: Learning and understanding about what, in principle, is fundamentally computationally possible or feasible in certain time frames within the universe given the laws of physics that we have seems to be foundationally important and interesting. It’s one of, I guess, the final frontiers. Not space, but I guess solving intelligence and computation and also the sort of hedonic qualia that comes along for the ride.

Roman: Right. I guess the magical aspect allows you to escape from your local physics and consider other types of physics and what would be possible outside of this world.

Lucas: What advances or potential advances in quantum computing or other sorts of more futuristic hardware and computational systems help and assist in these problems?

Roman: I think quantum computing has more impact on the cryptography and security in that way. It impacts some algorithms more directly. I don’t think there is a determined need for it right now in terms of AI research or AI safety work. It doesn’t look like a human brain is using a lot of quantum effects though some people argue that it’s important for consciousness. I’m not sure if there is definitive proof of that experimentally.

Lucas: Let’s go ahead now and turn to some questions that we’ve gotten from our audience.

Roman: Sounds good.

Lucas: I guess we’re going to be jumping around here between narrow and short-term AI and some other questions. It would be great if you could let me know about the state of safety and security in current AI in general and the evaluation and verification and validation approaches currently adopted by the industry.

Roman: In general, the state of safety and security in AI is almost nonexistent. It’s kind of we’re repeating history. When we worked on creating Internet security was not something we cared about and so Internet is completely insecure. Then was started work on Internet 2.0, Internet of things. We’re repeating the same mistake. All those very cheap devices made in China have no security but they’re all connected and that’s how you can create swarms of devices attacking systems.

It is my hope that we don’t repeat this with intelligent systems, but right now it looks like we are. We care about getting them to the market as soon as possible, making them as capable as possible, the soonest possible. Safety and security is something most people don’t know about, don’t care about. You can see it in terms of number of researchers working on it. You can see it in terms of percentage of funding allocated to AI safety. I’m not too optimistic so far, but the field is growing exponentially, so that’s a good sign.

Lucas: How does evaluation and verification and validation fit into all of this?

Roman: We have pretty good tools for verifying critical software. Something so important… you’re flying to mars, the system cannot fail. Absolutely. We can do mathematical proofs to show that the code you created matches the design you had. It’s an expensive process, but we can do a pretty good job with it. You can put more resources into verifying it with multiple verifiers. You can get any degree of accuracy you want as a cost of computational resource.

As far as I can tell, there is no or very little successful work on verifying systems which are capable of self-improvement, changing, dynamically learning, operating in novel environments. It’s very hard to verify something where you have no idea what the behavior should be in the first beforehand. If it’s something linear, again, we have a chess computer, we know what it’s supposed to do exactly. It’s a lot easier to verify than something more intelligent than you operating a new data in a new domain.

Lucas: Right. It seems like verification in this area of AI is going to require some much more foundational and difficult proofs and verification techniques here. It seems like you’re saying it also requires an idea of an end goal of what the system is actually intended to do in order to verify that it satisfies that.

Roman: Right. You have to verify it against something. I have a paper on unverifiability where I talk about mathematical fundamental limits to what we can prove and verify mathematically. Already, we’re getting to the point where our mathematical proofs are so complex and so long, most human mathematicians cannot possibly even check if it’s legitimate or not.

We have examples of proofs where a mathematical community as a whole still has not decided if something published 10 years ago is a valid proof. If you’re talking about doing proofs on a black box AI systems, now it seems like the only option we have is another AI mathematician, verify our AI, assisting us with that, but this creates this multiple levels of interaction where who’s verifying, verifiers and so on.

Lucas: It seems to me at least another expression of how deeply interdependent the AI alignment problem is. Technical AI alignment is a core issue, but it seems like even in simple things, or not simple things, but things which you would imagine to at least be purely relegated to computer science also has some sort of connections with ethics and policy and law and how these things will all sort of require each other in order to succeed in AI alignment.

Roman: I agree. You do need this complete picture. Overall, I mentioned it a few times before in other podcasts. It feels like an AI safety, every time we analyze a problem, we discovered that it’s like a fractal. There is then more problems under that one and you do it again. Despite the three levels, you still continue with this. It’s an infinite process.

We never get to a point where, “Okay, we solved this. This is not a problem anymore. We know for sure it works in every conceivable situation.” That’s a problem. You have this infinite surface you have to defend, but you only have to fail once to lose everything. It’s very, very different from standard cyber security where, “Okay, somebody stole my credit card. I’ll just get a new one. I’ll get to try again.” Very different approach.

Lucas: There’s no messing up with artificial superintelligence.

Roman: Basically.

Lucas: Just going off of what we were talking about earlier in terms of how AI safety researchers are flirting and interested in the applications of psychology in AI safety, what do you think about the potential future relationship between AI and neuroscience?

Roman: That is great work in neuroscience and trying to understand measurements from just observing neurons, cells to human behavior. There are some papers showing if we do the same thing with computer processors, we’re just going to get a very good microscope and look at the CPU. “Was it playing a video game? Can we figure out connections between what Mario is doing and what electrical wiring is firing and so on?”

There seems to be a lot of mistakes made in that experiment. That tells us that the neuroscience experiments we’re doing for a very long time may be providing some less-than-perfect data for us. In a way, by doing AI work, we can also improve on our understanding of human brain, medical science, just general understanding of how neural networks work. It’s a feedback loop. That is progress in either one benefits the other.

Lucas: It seems like people like Josh Tenenbaum are working on more neuro inspired approaches to creating AGI. It seems that there are some people who have the view or the philosophy that the best way to getting to general intelligence is probably going to be understanding and studying human beings because we’re in existence proof that can be studied of general intelligences. What are your views on this approach and the work being done there?

Roman: I think it’s a lot easier to copy answers to get to the results. In terms of developing capable system, I think it’s the best option we have. I’m not so sure it leads to a safe system because if you just copy design, you don’t fully understand it. You can replicate it without complete knowledge and then instilling safety into it as a an afterthought, as a add-on later on, maybe even more difficult than if you designed it from scratch yourself.

Lucas: A more general strategy and approach, which gets talked about a lot in the effective altruism community: there seems to be this view and you can correct me here anywhere I might get this narrative sort of wrong. It seems important to build the AGI safety community, the AI safety community in general, by bringing more researchers into the fold.

If we can slow down the people who are working on capability and raw intelligence and bring them over to safety, then that might be a very good thing because it slows down the creation of the intelligence part of AGI and puts more researchers into the part that’s working on safety and AI alignment. Then there’s also this tension where …

While, that is a good thing. It may be a bad thing for us to be promoting AI safety or AGI safety to the public community because they probably just … Journalists would spin it and ruin it and trivialize it, turn it into a caricature of itself and just put Terminator photos on everything, which we at FLI are very aware that journalists like to put Terminator stuff on people’s articles and publications. What is your general view about AI safety outreach and do you disagree with the respectability first approach?

Roman: I’m an educator. I’m a professor. It’s my job to teach students, to educate the public, to inform everyone about science and hopefully more educated populace would benefit all of us. Research is funded through taxpayer grants. The public university is funded through taxpayers. The students paying tuition, the general public essentially.

If our goal is to align AI with values of the people, how can we keep people in the dark? They’re the ones who are going to influence elections. They are the ones who are going to decide what good governance of AI essentially is by voting for the right people. We put so much effort into governance of AI. We have efforts at UN, European Parliament, White House, you name it. There are now agreements between France and Canada on what to do with that.

At the end of the day, politicians listen to the public. If I can educate everyone about what the real issues in science are, I think it’s a pure benefit. It makes sense to raise awareness of long-term issues. We do it in every other field of science. Would you ever suggest it’s not a good idea to talk about climate change? No, of course not. It’s silly. We all participate in the system. We’re all impacted by the final outcome. It’s important to provide the good public outreach.

If your concern is the picture of a title of an article, well  work with better journalists, tell them you cannot use a picture of a Terminator. I do it. I tell them and they end up putting a very boring picture on it and nobody clicks on it. Is Terminator then an educational tool? I was able to explain some advanced computability concepts in a few minutes with simple trivial examples. Then you educate people, you have to come to their level. You have to say, “Well, we do have concerns about military killer robots.” There’s nothing wrong with that, so maybe funding for killer robots should be reduced. If public agrees, that’s wonderful.

Just kind of going if an article I published or somebody interviewed me is less than perfect, then it’s not beneficial, I disagree with it completely. It’s important to get to the public, which is not already sold on the idea. Me doing interview for you right now, right? I’m preaching to the choir. Most of your listeners are into AI safety I’m sure. Or at least effective altruism.

Whereas if I do interview for BBC or something like that, now I’m getting access to millions of people who have no idea what superintelligence is. In my world and your world, this is like common knowledge, but I give a lot of keynotes and I would go and speak to top executives for accounting firms and I ask them basic questions about technology. Maybe one ever heard about superintelligence as a concept.

I think education is always a good thing. Having educated populace is wonderful because that’s where funding will eventually come from for supporting our research and for helping us with AI governance. I’m a very strong supporter of outreach and I highly encourage everyone to do very good articles on it. If you feel that a journalist misrepresents your point of view, get in touch, get it fixed. Don’t just say that we’re going to left public in a dark.

Lucas: I definitely agree with that. I don’t really like this elitism that is part of the culture within some parts of AI safety community, which thinks that only the smartest, most niche people should be aware of this and working on it given the safety concerns and the ways in which it could be turned into something else.

Roman: I was a fellow at the Singularity Institute for Artificial Intelligence what is now MIRI. At that time, they had a general policy of not publishing. They felt it was undesirable and will cause more damage. Now, they publish extensively. I had mentioned that, that’s maybe a good idea a few times.

The general idea of buying out top AI developers and turning them to the white side I guess and working on safety issues, I think that’s wonderful. We want the top people. It doesn’t mean we have to completely neglect less than big names. Everyone needs to be invited to the table in terms of support, in terms of grants. Don’t try to think that reputation means that only people at Harvard and MIT work in AI safety.

There is lots of talent everywhere. I work with remote assistance from around the world. There is so much talent out there. I think the results speak for themselves. I get invited to speak internationally. I advise governments, courts, legislative system. I think reputation only grows with such outreach.

Lucas: For sure and it seems like the education on this, because it can seem fairly complicated and people can be really confused about it because I think that there are lots of common myths that people have about intelligence and “consciousness construed” in some way other than how I think you or I construe the term consciousness or the idea of free will or what it means to be intelligent. There’s just so much room for people to be confused about this issue.

The issue is real and it’s coming and people are going to find out about it whether or not we discuss it now. It seems very important that this happens, but also because like … It seems we also exist in a world where something like 40% to 50% of our country is at least skeptical about climate change. Climate change education and advocacy is very important and should be happening.

Even with all of that education and advocacy, there’s still something like around 40% of people who are skeptical about climate change. That issue has become politicized where people aren’t necessarily interested in facts. At least the skeptics are committed to party lines on the issue.

Roman: What would it be without education, if they never heard about the issue, would percentage be zero?

Lucas: I’m not advocating against education. I’m saying that this is an interesting existence case and saying like, “Yeah, we need more education about AI issues and climate change issues in general.”

Roman: I think there are maybe even more disagreement, not so much about how true of a problem is, but how to fix it. It turns into a political issue, then you start talking about let’s increase taxation, let’s decrease taxation. That’s what politicizes. That is not the fundamental science.

Lucas: I guess I just want to look this up actually just to figure out what the general American populace thinks. I think it was a bit wrong.

Roman: I don’t think it’s important what the exact percentage is. I think it’s general concept we care about.

Lucas: It’s a general concept, but I guess I was just potentially introducing a level of pessimism about why we need to educate people more so about AI alignment and AI safety in general just because these issues, even if you’re extremely skillful about them, can become politicized. Just generally the epistemology of America right now is exploding in a giant mess of bullshit. It’s just important that we educate clearly and correctly.

Roman: You don’t have to start with the most extreme examples or I don’t go with paperclip maximizers or whatever. You can talk about career selection, technological unemployment, basic income. Those things are quite understandable and they provide wonderful base for moving to the next level once we get there.

Lucas: Absolutely. Totally in agreement. How would you describe the typical interactions that you get from mainstream AI and CS researchers who just do sort of standard machine learning and don’t know or really think or care about AGI and ASI? When you talk to them and pitch to them like, “Hey, maybe you should be working on AI safety.” Or, “Hey, AI safety is something that is real, that you should care about.”

Roman: You’re right. There are different types of people based on their background knowledge. There is group one, which never heard of the concept. It’s just not part of their world. You can start by just sharing some literature and you can follow up later. Then there are people who are in complete agreement with you. They know it’s important. They understand the issue, but that’s their job they’re working and I think they are sympathetic to the cause.

Then there are people who heard a few kind of not the best attempts to explain what AI risk is, and so they are skeptical. They may be thinking about Terminator movie or something, Matrix, and so they are quite skeptical. In my personal experience, if I had a chance to spend 30 minutes to an hour with a person one-on-one, they all converted. I never had someone who went, “You told me things, but I have zero concern about intelligent systems having bugs in them or side effects or anything like that.”

I think it’s just a question of spending time and making it a friendly expedience. You’re not adversaries trying to fight it out. You’re just going, “Hey, every single piece of software we ever produced had bugs in it and can be had.” How is this different?

Lucas: I agree with you, but there are also seems to be these existence proofs and existence cases of people who are computer scientists and who are super skeptical about AI safety efforts and working on ASI safety like Andrew Ng and others.

Roman: You have to figure out each individual case-by-case basis of course, but just being skeptical about success of his approach is normal. I told you my main concern, is the problem solvable. That’s a degree of skepticism. If we looked at any other industry. Let’s say we had oil industry. The top executive oil industry said that global climate change is not important. Just call it redistribution of good weather or something, it’s not a big deal.

You would immediately think there is some sort of conflict of interest, right? But how is this different? If you are strongly dependent on development, not on anything else, it just makes sense that you would be 100% for development. I don’t think it’s unnatural at all. Again, I think a good conversation and realignment of incentives would do miracles for such cases.

Lucas: It seems like either because Andrew Ang’s timelines are so long or he just thinks that they’re fundamentally, like there’s just not really a big problem. I think there are some computer scientists, researchers who just think there’s just not really a problem, because we’re making the systems and there are systems that are so intertwined with us that the values will just naturally mesh together or something. I’m just so surprised I guess that from the mainstream CS and AI people that you don’t run into more skeptics.

Roman: I don’t start my random interactions with people by trying to tell them, “You are wrong. Change your mind.” That’s usually not the best approach. Then you talk about specific cases and you can take it slowly and increase the level of concern. You can start by talking about algorithmic justice and bias in algorithms and software verification. I think you’ll get 100% support at all those levels.

What happens when your system is slightly more capable, you’re still working with me? I don’t think there is a gap where you go, “Well, at that point, everything becomes rosy and safe and we don’t have to worry about it.” If a disagreement is about how soon, I think it’s not a problem at all. Everything I argue still applies in 20 years, 50 years, 100 years.

If you’re saying it will take 100 years to get to superintelligence, how long will it take to learn how to control a system we don’t have yet? Probably way longer than that. Already, we should have started 50 years ago. It’s too late now. If anything, it strengthens my point that we should put more resources on the safety side.

Lucas: Absolutely. Just a question about generally your work cataloging failures of AI products and what this means for the future.

Roman: I collect examples, historical examples starting with the very first AI systems, still everyday news of how AI systems fail. The examples you all heard about. Self-driving car kills a pedestrian. Or Microsoft Tay chat bot becomes racist and swears at people. I have maybe about 50 or 60 so far. I keep collecting new ones. Feel free to send me lots of cool examples, but make sure they’re not already on my list.

The interesting thing is the patterns. You can get from it, learn from it and use to predict future failure. One, obviously as AI becomes more common, we have more of those systems, the number of such failures grows. I think it grows exponentially and impacts from them grows.

Now, we have intelligent systems trading in the stock market. I think they take up something like 85% of all stock trades. We had examples where they crash the whole stock market, brought down the volume by $1 trillion or something, closed significant amounts. This is very interesting data. I try to create a data set of those examples and there is some interest from industry to understand how to make their products not make my list in the future.

I think so far the only … It sounds like a trivial conclusion, but I think it’s fundamental. The only conclusion I have is that if you design an AI system to do X, it will very soon fail to X whatever X stands for. It seems like it’s only going to get worse as they become more general because the value of X becomes not just narrow. If you designed a system to play chess, then it will fail to win a chess match. That’s obvious and trivial. But if you design the system to run the world or something like that, what is X here?

Lucas: This makes me think about failure modes. Artificial superintelligence is going to have a probability space of failure modes where the severity of the failure at the worst end … We covered this in my last podcast is it would literally be turning the universe into the worst possible suffering imaginable for everyone for as long as possible. That’s some failure mode of ASI which has some probability which is unknown. Then the opposite on the other end is going to be, I guess, the most well-being and bliss for all possible minds, which exists in that universe. Then there’s everything in between.

I guess the question is, is there any mapping or how important is it in mapping this probability space of failure modes? What are the failure modes that ASI can do or that would occur that would make it not value aligned? What are the probabilities of each of those given, I don’t know, the sort of architecture that we expect ASI to have or how we expect ASI to function?

Roman: I don’t think there is a worst and best case. I think it’s infinite in both directions. It can always get worse and always get better.

Lucas: But it’s constrained by what is physically possible.

Roman: Knowing what we know about physics and within this universe, there is a big multiverse out there possibly with different types of physics and simulated environments can create very interesting side effects as well. That’s not the point. I also collect predicted failures of future systems, part of a same report. You can look it up. That’s very interesting to see what usually a scientist, but sometimes science fiction writers, other people had said as potential examples.

It has things like paperclip maximizer and other examples. I look at predictions which are predictions but short-term. For example, we can talk about sex robots and how they’re going to fail. Someone hacks them, then they forget to stop. You forget your safe word. There are interesting possibilities.

Very useful both as an educational tool to get people to see this trend and go, “Okay. At every level of AI development, we had problems proportionate to the capability of AI. Give me a good argument why it’s not the case moving forward?” Very useful tool for AI safety researchers to predict. “Okay, we’re releasing this new system tomorrow. It’s capable of X.” How can we make sure the problems don’t follow?

I published on this, for example, before Microsoft released their Tay chatbot. Giving Xs to users to manipulate your learning data is usually not a safe option. If they just knew about it, maybe they wouldn’t embarrass themselves so bad.

Lucas: Wonderful. I guess just one last question here. My view was that given a superintelligence originating on earth, there would be a physical maximum of the amount of matter and energy which it could manipulate given our current understanding and laws of physics, which are certainly subject to change if we gain new information.

There is something which we could call, as Nick Bostrom explains, the cosmic endowment which is sort of the sphere around an intelligent species, which is running a superintelligent optimization process. Where the sphere represents the maximum amount of matter and energy, a.k.a., galaxies a superintelligence can reach before the universe expands so much that it’s no longer able to get beyond that point. Why is your view that there isn’t a potentially physical best or physical worst thing that, that optimization process could do?

Roman: Computation is done with respect to time. It may take you twice as long to compute something with the same resources, but you’ll still get that if you don’t have limits on your time. Or you create a subjective time for whoever is experiencing things. You can have computations which are not in parallel, serial computation devoted to a single task. It’s quite possible to create, for example, levels of suffering which progressively get worse I think. Again, I don’t encourage anyone experimenting with that, but it seems like things can get worse not just because of limitations, of how much computing I can do.

Lucas: All right. It’s really been a wonderful and exciting conversation Roman. If people want to check out your work or to follow you on Facebook or Twitter or wherever else, what do you recommend people go to read these papers and follow you?

Roman: I’m very active in social media. I do encourage you to follow me on Twitter, RomanYam, or on Facebook, Roman Yampolskiy. Just Google my name. My Google Scholar has all the papers and just trying to make a sell here. I have a new book coming out, Artificial Intelligence Safety and Security. It’s an edited book with all the top AI safety researchers contributing, and it’s due out in August, mid August. Already available for presale.

Lucas: Wow. Okay. Where can people get that? On Amazon?

Roman: Amazon is a great option. It’s published by CRC Press, so you have multiple options right now. I think it’s available as a softcover and hardcover, which are a bit pricey. It’s a huge book about 500 pages. Most people would publish it as a five book anthology, but you get one volume here. It should come out as a very affordable digital book as well, about $30 for 500 pages.

Lucas: Wonderful. That sounds exciting. I’m looking forward to getting my hands on that. Thanks again so much for your time. It’s really been an interesting conversation.

Roman: My pleasure and good luck with your podcast.

Lucas: Thanks so much. If you enjoyed this podcast, please subscribe, give it a like or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment Series.

Podcast: Mission AI – Giving a Global Voice to the AI Discussion with Charlie Oliver and Randi Williams

How are emerging technologies like artificial intelligence shaping our world and how we interact with one another? What do different demographics think about AI risk and a robot-filled future? And how can the average citizen contribute not only to the AI discussion, but AI’s development?

On this month’s podcast, Ariel spoke with Charlie Oliver and Randi Williams about how technology is reshaping our world, and how their new project, Mission AI, aims to broaden the conversation and include everyone’s voice.

Charlie is the founder and CEO of the digital media strategy company Served Fresh Media, and she’s also the founder of Tech 2025, which is a platform and community for people to learn about emerging technologies and discuss the implications of emerging tech on society. Randi is a doctoral student in the Personal Robotics Group at the MIT Media Lab. She wants to understand children’s interactions with AI, and she wants to develop educational platforms that empower non-experts to develop their own AI systems. 

Topics discussed in this episode include:

  • How to inject diversity into the AI discussion
  • The launch of Mission AI and bringing technologists and the general public together
  • How children relate to AI systems, like Alexa
  • Why the Internet and AI can seem like “great equalizers,” but might not be
  • How we can bridge gaps between the generations and between people with varying technical skills

Papers discussed in this episode include:

You can listen to this episode above or read the transcript below. And don’t forget to check out previous episodes of FLI’s monthly podcast on SoundCloud, iTunes, Google Play and Stitcher.


Ariel: Hi, I am Ariel Conn with The Future of Life Institute. As a reminder, if you’ve been enjoying our podcasts, please remember to take a minute to like them, and share them, and follow us on whatever platform you listen on.

And now we’ll get on with our podcast. So, FLI is concerned with broadening the conversation about AI, how it’s developed, and its future impact on society. We want to see more voices in this conversation, and not just AI researchers. In fact, this was one of the goals that Max Tegmark had when he wrote his book, Life 3.0, and when we set up our online survey about what you want the future to look like.

And that goal of broadening the conversation is behind many of our initiatives. But this is a monumental task, that we need a lot more people working on. And there is definitely still a huge communications gap when it comes to AI.

I am really excited to have Charlie Oliver, and Randi Williams with me today, to talk about a new initiative they’re working on, called Mission AI, which is a program specifically designed to broaden this conversation.

Charlie Oliver is a New York based entrepreneur. She is the founder and CEO of Served Fresh Media, which is a digital media strategy company. And, she’s also the founder of Tech 2025, which is a platform and community for people to learn about emerging technologies, and to discuss the implications of emerging tech on our society. The mission of Tech 2025 is to help humanity prepare for, and define what that next technological era will be. And so it was a perfect starting point for her to launch Mission AI.

Randi Williams is a doctoral student in the personal robotics group at the MIT Media Lab. Her research bridges psychology, education, engineering, and robotics, to accomplish two major goals. She wants to understand children’s interactions with AI, and she wants to develop educational platforms that empower non-experts to develop their own AI systems. And she’s also on the board of Mission AI.

Randi and Charlie, thank you both so much for being here today.

Charlie: Thank you. Thank you for having us.

Randi: Yeah, thanks.

Ariel: Randi, we’ll be getting into your work here a little bit later, because I think the work that you’re doing on the impact of AI on childhood development is absolutely fascinating. And I think you’re looking into some of the ethical issues that we’re concerned about at FLI.

But first, naturally we wanna start with some questions about Mission AI. And so for example, my very first question is, Charlie can you tell us what Mission AI is?

Charlie: Well, I hope I can, right? Mission AI is a program that we launched at Tech 2025. And Tech 2025 was launched back in January of 2017. So we’ve been around for a year and a half now, engaging with the general public about emerging technologies, like AI, blockchain, machine learning, VR/AR. And, we’ve been bringing in experts to engage with them — researchers, technologists, anyone who has a stake in this. Which pretty much tends to be everyone, right?

So we’ve spent the last year listening to both the public and our guest speakers, and we’ve learned so much. We’ve been so shocked by the feedback that we’ve been getting. And to your initial point, we learned, as I suspected early on, that there is a big, huge gap between how the general public is interpreting this, and what they expect, and how researchers are interpreting this. And how corporate America, the big companies, are interpreting this, and hope to implement these technologies.

Equally, those three separate entities also have their fears, their concerns, and their expectations. We have seen the collision of all three of those things at all of our events. So, I decided to launch Mission AI to be part of the answer to that. I mean, because as you mentioned, it is a very complicated, huge problem, monumental. And what we will do with Mission AI, is to address the fact that the general public really doesn’t know anything about the AI, machine learning research that’s happening. And there’s, as you know, a lot of money, globally, being tossed — I don’t wanna say toss — but AI research is heavily funded. And with good reason.

So, we want to do three things with this program. Number one, we want to educate the general public on the AI machine learning research ecosystem. We happen to believe that it’s crucial that, in order for the general public to participate — and understand what I mean by the general public, I should say, that includes technologists. Like 30 to 35 percent of our audience are engineers, and software developers, and people in tech companies, or in companies working in tech. They also include business people, entrepreneurs, students, we have baby boomers, we have a very diverse audience. And we designed it so that we can have a diverse conversation.

So we want to give people an understanding of what AI research is, and that they can actually participate in it. So we define the ecosystem for them to keep them up to date on what research is happening, and we give them a platform to share their ideas about it, and to have conversations in a way that’s not intimidating. I think research is intimidating for a lot of people, especially academic research. We however, will be focusing more on applied research, obviously.

The second thing that we want to do is, we want to produce original research on public sentiment, which, it’s a huge thing to take on, but the more that we have moved, grown this community — and we have several thousand people in our community now, we’ve done events here, and in Toronto; we’ve done over 40 events across different topics — we are learning that people are expressing ideas, and concerns, and just things that I have been told by researchers who come in to speak at our events, it’s surprising them. So, it’s all the more important that we get the public sentiment and their ideas out. So our goal here is to do research on what the public thinks about these technologies, about how they should be implemented, and on the research that is being presented. So a lot of our research will be derivative of already existing research that’s out there.

And then number three, we want to connect the research community, the AI research community, with our community, or with the broader public, which I think is something that’s really, very much missing. And we have done this at several events, and the results are not only absolutely inspiring, everyone involved learns so much. So, it’s important, I think, for the research community to share their work with the general public, and I think it’s important for the general public to know who these people are. There’s a lot of work being done, and we respect the work that’s being done, and we respect the researchers, and we want to begin to show the face of AI and machine learning, which I think is crucial for people to connect with it. And then also, that extends to Corporate America. So the research will also be available to companies, and we’ll be presenting what we learn with them as well. So that’s a start.

Ariel: Nice. So to follow up on that a little bit, what impact do you hope this will have? And Randi, I’d like to get your input on some of this as well in terms of, as an AI researcher, why do you personally find value in trying to communicate more with the general public? So it’s sort of, two questions for both of you.

Randi: Sure, I can hop in. So, a lot of what Charlie is saying from the researcher’s side, is a big question. It’s a big unknown. So actually a piece of my research with children is about, well when you teach a child what AI is, and how it works, how does that change their interaction with it?

So, if you were extend that to something that’s maybe more applicable to the audience — if you were to teach your great, great grandma about how all of the algorithms in Facebook work, how does that change the way that she posts things? And how does that change the way that she feels about the system. Because we very much want to build things that are meaningful for people, and that help people reach their goals and live a better life. But it’s often very difficult to collect that data. Because we’re not huge corporations, we can’t do thousand person user studies.

So, as we’re developing the technology and thinking about what directions to go in, it’s incredibly important that we’re hearing from the baby boomers, and from very young people, from the scientists and engineers who are maybe in similar spaces, but not thinking about the same things, as well as from parents, teachers, all of the people who are part of the conversation.

And so, I think what’s great about Mission AI is that it’s about access, on both ends.

Charlie: So true. And you know, to Randi’s point, the very first event that we did was January the 11th, 2017, and it was on chatbots. And I don’t know if you guys remember, but that doesn’t seem like a long time ago, but people really didn’t know anything about chatbots back then.

When we had the event, which was at NYU, it sold out in record time, like in two days. And when we got everybody in the room, it was a very diverse audience. I mean we’re talking baby boomers, college students, and the first question I asked was, “How many people in here are involved in some way with building, or developing chatbots, in whatever way you might be?” And literally I would say about, 20 to 25 percent of the hands went up.

For everyone else, I said, “Well, what do you know chatbots? What do you know about it?” And most said, “Absolutely nothing.” They said, “I don’t know anything about chatbots, I just came because it looked like a cool event, and I wanna learn more about it.”

But, by the end of the event, we help people to have these group discussions and solve problems about the technologies, together. So that’s why it’s called a think tank. At the end of the event there were these two guys who were like 25, they had a startup that works with agencies that develop chatbots for brands. So they were very much immersed in the space. After the event, I would say a week later, one of them emailed me and said, “Charlie, oh my God, that event that you did, totally blew our minds. Because we sat in a group with five other people, and one of those people was John. He’s 75 years old. And he talked to us.” Part of the exercise that they had to do was to create a Valentine’s Day chatbot, and to write the conversational flow of that chatbot. And he said that after talking to John, who’s 75 years old, about what the conversation would be, and what it should be, and how it can resonate with real people, and different types of people. He said that they realized they had been building chatbots incorrectly all along. He realized that they were narrowing their conversations, in the conversational flows, in a way that restricted their technology from being appealing to someone like him. And they said that they went back, and re-did a lot of their work to accommodate that.

So I thought that was great. I think that’s a big thing in terms of expectations. We want to build these technologies so that they connect with everyone. Right?

Ariel: I’d like to follow up with that. So there’s basically two sides of the conversation. We have one side, which is about educating the public about the current state, and future of artificial intelligence. And then, I think the other side is helping researchers better understand the impact of their work by talking to these people who are outside of their bubbles.

It sounds to me like you’re trying to do both. I’m curious if you think both are either, equally challenging, or easy to address, or do you think one side is harder? How do you address both sides, and effect change?

Charlie: That is a great, great question. And I have to tell you that on both sides, we have learned so much, about both researchers, and the general public. One of the things that we learned is that we are all taking for granted what we think we know about people. All of us. We think we’ve got it down. “I know what that student is thinking. I know what that black woman is thinking. I know how researchers think.” The fact of the matter is, we are all changing so much, just in the past two to three years, think about who you were three years ago. We have changed how we think about ourselves and the world so much in the past two years, that it’s pretty shocking, actually. And even within the year and a half that we have been up and going, my staff and I, we sit around and talk about it, because it kind of blows our minds. Even our community has changed how they think about technologies, from January of last year, to today. So, it’s actually extremely, extremely difficult. I thought it would get easier.

But here’s the problem. Number one, again, we all make assumptions about what the public is thinking. And I’m gonna go out on a limb here and say that we’re all wrong. Because they are changing the way that they think, just as quickly as the technologies are changing. And if we don’t address that, and meet that head on, we are always going to be behind, or out of sync, with what the general public is thinking about these technologies. And I don’t think that we can survive. I don’t think that we can actually move into the next era of innovation unless we fix that.

I will give you a perfect example of that. Dr. James Phan co-created the IBM Watson Q&A system. And he’s one of our speakers. He’s come to our events maybe two or three times to speak.

And he actually said to me, as I hear a lot from our researchers who come in, he says, “My God, Charlie, every time I come to speak at your event, I’m blown away by what I hear from people.” He said, “It seems like they are thinking about this very differently.” He says, “If you ask me, I think that they’re thinking far more in advance than we think that they are.”

And I said, “Well, that shocks me.” And so, to give you a perfect example of that, we did an event with Ohio State regarding their Opioid Technology Challenge. And we had people in New York join the challenge, to figure out AI technologies that could help them in their battle against opioid addiction in their state. And I had him come in, as well as several other people come in, to talk about the technologies that could be used in this type of initiative. And James is very excited. This is what I love about researchers, right? He’s very excited about what he does. And when he talks about AI, he lights up. I mean you’ve just never seen a man so happy to talk about it. So he’s talking to a room full of people who are on the front lines of working with people who who are addicted to opioids, or have some sort of personal connection it. Because we invited people like emergency responders, we invited people who are in drug treatment facilities, we’ve invited doctors. So these are people who are living this.

And the more he talked about algorithms, and machine learning, and how they could help us to understand things, and make decisions, and they can make decisions for us, the angrier people got. They became so visibly angry, that they actually started standing up. This was in December. They started standing up and shouting out to him, “No way, no way can algorithms make decisions for us. This is about addiction. This is emotional.” And they really, it shocked us.

I had to pull him off the stage. I mean, I didn’t expect that. And he didn’t see it, because he just kept talking, and I think he felt like the more he talked about it, the more excited they would become, like him, but it was quite the contrary, they became angrier. That is the priceless example, perfect example, of how the conversations that we have, that we initiate between researchers and the public, are going to continue to surprise us. And they’re going to continue to be shocking, and in some cases, very uncomfortable. But we need to have them.

So, no it is not easy. But yes we need to have them. And in the end, I think we’re all better for it. And we can really build technologies that people will embrace, and not protest.

Ariel: So Randi, I’d like to have you jump in now, because you’ve actually done, from the researcher side, you’ve done an event with Tech 2025, or maybe more than one, I’m not sure. So I was hoping you could talk about your experience with that, and what you gained out of it.

Randi: Yeah, so that event I was talking about a piece of research I had done, where I had children talk about their perceptions of smart toys. And so this is a huge, also, like Charlie was saying, inflammatory topic because, I don’t know, parents are extremely freaked out. And I think, no offense to the media, but there’s a bit of fear mongering going on around AI and that conversation. And so, as far as what’s easier, I think the first step, what makes it really difficult for researchers to talk to the public right now, is that we have been so far out of the conversation, that the education has gotten skewed. And so it’s difficult for us to come in and talk about algorithms, and machines making decisions, without first dealing with, you know, and this is okay, and it’s not a terminator kind of thing. At the end of the day, humans are still in control of the machines.

So what was really interesting about my experience, talking with Tech 2025, is that, I had all of these different people in the room, a huge variety of perspectives. And the biggest thing to hear, was what people already knew. And, as I was talking and explaining my research, hearing their questions, understanding what they understood already, what they knew, and what wasn’t so clear. So one of the biggest things is, when you see an AI system teach itself to play chess, and you’re like, “Oh my God, now it’s gonna teach itself to like, take over a system, and hack into the government, and this is that.” And it’s like, no, no, it’s just chess. And it’s a huge step to get any further than that.

And so it was really great practice for me to try and take people who are in that place, and say, “Well no, actually this is how the technology works, and this is the limitations.” And try to explain, you know, so when could this happen, in what particular universe could this happen? Well maybe, like in 20 years if we find a general AI, then yeah, it could teach itself to solve any problem. But right now, every single problem requires years of work.

And then seeing what metaphors work. What metaphors make sense for an AI scientist who wants to relate to the public. What things click, which things don’t click? And I think, another thing that happened, that I really loved was, just thinking about the application space. I’m asking research questions that I think are intellectually interesting for my work. But, there was a person from a company, who was talking about implementing a skill in Alexa, and how they didn’t know if using one of their characters on Alexa, would be weird for a child. Because, I was talking about how children look at an Alexa, and they think Alexa’s like a person. So Alexa is an Alexa, and if you talk to another Alexa, that’s a new Alexa. Yeah they have the same name, but completely different people, right?

So what happens when Alexa has multiple personality disorder? Like how does a child deal with that? And that was a question that never would have come up, because I’m not writing skills with different characters for children. So, that’s just an example of how learning as an AI scientist, how to give, how to listen to what people are trying to understand, and how to give them the education they need. But then also taking, okay, so when you’re at home and your child is doing xyz with Alexa, where are the questions there that you have, that researchers should be trying to answer? So, I don’t know which one is harder.

Charlie: I specifically went after Randi for this event. And I invited her because, I had been thinking in my mind for a while, that we are not talking about children in AI, not nearly enough. Considering that they’re gonna be the ones in ten to 15 years who are gonna be developing these things, and this technology and everything. So I said, “You know, I am willing to bet that children are thinking very differently about this. Why aren’t we talking about it?” So, I get online, I’m doing all my, as anyone would, I do all my little research to try to figure it out, and when I came across Randi’s research, I was blown away.

And also, I had her in mind with regards to this because I felt like this would be the perfect test of seeing how the general public would receive research, from a research assistant who is not someone who necessarily has — obviously she’s not someone who has like 20 years of experience behind her, she’s new, she’s a fresh voice. How would she be received? How would the research be received?

And on top of that, to be honest with you, she’s a young black woman. Okay? And in terms of diversity of voices within the research community, and within the AI discussion as a whole, this is something I want to address, aggressively.

So we reached out to the toy companies, we reached out to child psychologists, teachers, students, children’s museums, toy stores, I can’t tell you how many people we reached out to in the greater New York City area.

Randi was received so well, that I had people coming up to me, and high fiving me, saying, “Where did you get her? Where did you find her?” And I’m like, “Well you know, she didn’t drop out of the sky. She’s from MIT.”

But Randi’s feedback was crucial for me too because, I don’t know what she’s getting from it. And we cannot be effective at this if we are not, all of us, learning from each other. So if my researchers who come in and speak aren’t learning, I’m not doing my job. Same with the audience.

Ariel: So, Randi, I’m gonna want to start talking about your research here in a minute, ’cause we’ve just gotten a really great preview of the work you’re doing. But before we get to that, one, not final question, but for a little bit, a final question about Mission AI, and that is this idea of diversity.

AI is not a field that’s known for being diverse. And I read the press release about this, and the very first thing, in the very first bullet point, about what Mission AI is going to do, was about injecting diversity. And so my question to both of you is, how can we do that better? How can the AI community do that better? And in terms of the dialogue for who you’re reaching out to, as well, how can we get more voices?

Randi: You know in some ways, it’s like, there’s nothing you can do, to not do better. I think what Mission AI is really about, is thinking about who’s coming to the table to hear these things, very critically. And being on the board, as Charlie said, a black woman, the people who I talk to in AI are people of color, and women, right? So, I hope that as being a main part of this, and having Charlie also be a main part of that, we have a network that’s both powerful, in terms of having the main players in AI come to the table, but you know, main players that are also not, I guess the stereotypical AI scientist that you would think of.

So, what makes this different is who’s leading it, and the fact that we’re thinking about this from the very beginning. Like, “Okay, we’re gonna reach out. We want to recruit research scientists,” so I’m thinking of my peers who are in schools all across the country, and what they’re doing, and how this can be meaningful for them, and how they can, I guess, get an experience in communicating their research with the public.

Charlie: Yeah, I totally agree.

In addition to that, bringing in people who are from different backgrounds, and bringing diversity to the speakers, is very important. But it’s equally as important to have a diverse room. The first thing that I decided when I launched Tech 2025, and the reason that I’ve decided to do it this way, is because, I did not want to have a room full of the hoodie crowd. Which is, you know, white guys in their 20’s with hoodies on. Right? That’s the crowd that usually gets the attention with regards to AI and machine learning. And no offense to them, or to what they’re doing, everyone’s contributing in their own way.

But I go to tech events, as I know you guys do too. I go to tech events here, and in San Francisco, and across the country, and different parts of the world. And, I see that for the most part a lot of these rooms are filled, especially if you talk about blockchain, and cryptocurrency, which we do as well, they’re filled with primarily white guys.

So, I intentionally, and aggressively, made it a point to include as many people from various backgrounds as possible. And it is a very deliberate thing that you have to do, starting with the content. I don’t think a lot of people realize that, because people say to me, “How do you get such diverse people in the room?”

Well number one, I don’t exclude anyone, but also, the content itself asks people from various backgrounds to come in. So, a lot of times, especially in our earlier events, I would make a point of saying, it doesn’t matter who you are, where you’re from, we don’t care if you’re a technologist, or if you are a baby boomer who’s just curious about this stuff, come on in. And I have actually had people in their 60s come to me, I had a woman come to me last year, and she says, “My God Charlie, I feel like I really can participate in these discussions at your event. I don’t feel like I’m the odd woman out, because I’m older.”

So I think that’s a very important thing, is that, when researchers look at the audience that they’re talking to, they need to see diversity in that audience too. Otherwise, you can reinforce the biases that we have. So if you’re a white guy and you’re talking to an audience full of nothing but white guys, you’re reinforcing that bias that you have about what you are, and the importance of your voice in this conversation.

But when my guests come in to speak, I tell them first and foremost, “You are amazing. I love the work that you do, but you’re not the … The star of the show is the audience. So when you look at them, just know that they are, it’s very important that we get all of their feedback. Right? That we allow them to have a voice.” And it turns out that that’s what happens, and I’m really, I’m happy that we’re creating a dialogue between the two. It’s not easy. I think it’s definitely what needs to happen. And with going back to what Randi says, it does need to be deliberate.

Ariel: I’m going to want to come back to this, because I want to talk more about how Mission AI will actually work. But I wanna take a brief pause, because we’ve sort of brought up some of Randi’s work, and I think her work is really interesting. So I wanted to talk, just a little bit about that, since the whole idea of Mission AI is to give a researcher a platform to talk about their work too.

So, one of my favorite quotes ever, is the Douglas Adams quote about age and technology, and he says, “I’ve come up with a set of rules that describe our reactions to technologies. One, anything that is in the world when you’re born, is normal and ordinary and is just a natural part of the way the world works. Two, anything that’s been invented when you’re 15 to 35 is new, and exciting, and revolutionary, and you can probably get a career in it. Three, anything invented after you’re 35 is against the natural order of things.”

Now, I personally, I’m a little bit worried that I’m finding that to be the case. And so, one of things that I’ve found really interesting is, we watch these debates about what the impact of AI will be on future generations. There are technologies that can be harmful, period. And trying to understand, when you’re looking at a technology that can be harmful, versus when you’re looking at a technology and you just don’t really know what the future will be like with it, I’m really curious what your take on how AI will impact children as they develop, is. You have publications that, there’s at least a couple great titles. One is, “Hey Google, is it okay if I eat you?” And then another is, “My Doll Says It’s Okay, Voice Enabled Toy Influences Children’s Moral Decisions.”

So, my very first question for you is, what are you discovering so far with the way kids interact with technology? Is there a reason for us to be worried? Is there also reason for us to be hopeful?

Randi: So, now that I’m hearing you say that, I’m like, “Man I should edit the titles of my things.”

First, let me label myself as a huge optimist of AI. Obviously I work as an AI scientist. I don’t just study ethics, but I also build systems that use AI to help people reach their goals. So, yeah, take this with a grain of salt, because obviously I love this, I’m all in it, I’m doing a PhD on it, and that makes my opinion slightly biased.

But here’s what I think, here’s the metaphor that I like to use when I talk about AI, it’s kind of like the internet. When the internet was first starting, people were like, “Oh, the Internet’s amazing. It’s gonna be the great equalizer, ’cause everyone will be able to have the same education, ’cause we’ll all have access to the same information. And we’re gonna fix poverty. We’re gonna fix, everything’s gonna go away, because the internet.” And in 2018, the Internet’s kind of like, yeah, it’s the internet, everyone has it.

But it wasn’t a great equalizer. It was the opposite. It’s actually creating larger gaps in some ways, in terms of people who have access to the internet, and can do things, and people who don’t have access. As well as, what you know about on the internet makes a huge difference in your experience on it. It also in some ways, promotes, very negative things, if you think about like, the dark web, modern day slavery, all of these things, right? So it’s like, it’s supposed to be great, it’s supposed to be amazing. It went horribly wrong. AI is kind of like that. But maybe a little bit different in that, people are already afraid of it before it’s even had a chance.

In my opinion, AI is the next technology that has the potential to be a great equalizer. The reason for that is, because it’s able to extend the reach that each person has in terms of their intellectual ability, in terms of their physical ability. Even, in terms of how they deal with things emotionally and spiritually. There’s so many places that it can touch, if the right people are doing it, and if it’s being used right.

So what’s happening right now, is this conversation with children in AI. The toy makers, and the toy companies are like, “We can create a future where every child grows up, and someone is reading to them, and we’re solving all the problems. It’s gonna be great.” And then they say to the parents, “I’m gonna put this thing in your home, and it’s gonna record everything your child says, and then it’s gonna come back to our company, and we’re gonna use it to make your life better. And you’re gonna pay us for it.” And parents are like, “I have many problems with this. I have many, many problems with everything that you’re saying.”

And so, there’s this disconnect between the potential that AI has, and the way that it’s being seen as the public, because, people are recognizing the dangers of it. They’re recognizing that the amount of access that it has, is like, astronomical and crazy. So for a second, I’ll talk about the personal robots group. In the MIT Media Lab, the personal robots group, we specifically build AI systems that are humanistic. Meaning that we’re looking at the way that people interact with their computers, and with cellphones, and it’s very, cagey. It’s very transactional, and in many ways it doesn’t help people live their lives better, even though it gives them more access. It doesn’t help them achieve all of their goals. Because you know, in some ways it’s time consuming. You see a group of teenagers, they’re all together, but they’re all texting on phones. It’s like, “Who are you talking to? Talk to your friends, they’re right there.” But that’s not happening, so we built systems specifically, that try to help people achieve their goals. One great example of that, is we found educational research that says that your vocabulary at the age of five, is a direct predictor of your PSAT score in the 11th grade. And as we all know, your PSAT score is a predictor of your SAT score. Your SAT score is a predictor of your future income, and potential in life, and all these great things.

So we’re like, “Okay, we wanna build a robot that helps children, who may not have access for any number of reasons, be able to increase their vocabulary size.” And we were gonna use AI that can personalize to each child, because every child’s different. Some children want the competitive robot that’s gonna push them, some children want the friendly robot that’s gonna work with them, and ask them questions, and put them in the perspective of being a teacher. And, AI is the only thing, like in a world, where classroom sizes are getting bigger, where parents can’t necessarily spend as much time at home, those are the spaces where we’re like, AI can help. And so we build systems that do that.

We don’t just think about teaching this child vocabulary words. We think about how the personality of the robot is shaping the child as a learner. So how is the robot teaching the child to have a growth mindset, and teaching them to persevere, to continue learning better. So those are the kinds of things that we want to instill, and AI can do that.

So, when people say, “AI is bad, it’s evil.” We’re like, “Well, we’re using a robot that teaches children that working hard is more important than just being magically smart.” ‘Cause having a non-growth mindset, like, “I’m a genius,” can actually be very limiting ’cause when you mess up, then you’re like, “I’m not a genius. I’m stupid.” It’s like, no, work hard, you can figure things out.

So, personally, I think, that kind of AI is extremely impactful, but the conversation that we need to have now, is how do we get that into the public space, in an appropriate way. So maybe, huge toy companies shouldn’t be the ones to build it, because they obviously have a bottom line that they’re trying to fill. Maybe, researchers are the ones who wanna build it. My personal research is about helping the public build their own AI systems to reach these goals. I want a parent to be able to build a robot for their child, that helps the child better reach their goals. And not to replace the parent, but you know, there are just places where a parent can’t be there all the time. Play time, how can play time, how can the parent, in some ways, engineer their child’s play time, so that they’re helping the child reinforce having a growth mindset, and persevering, and working hard, and maybe cleaning up after yourself, there are all these things.

So if children are gonna be interacting with it anyways, how can we make sure that they’re getting the right things out of that?

Ariel: I’d like to interject with a question real quick. You’d mentioned earlier that parents aren’t psyched about having all of their kids’ information going back to toy companies.

Randi: Yeah.

Ariel: And so, I was gonna ask if you see ways in which AI can interact with children that doesn’t have to become basically massive data dumps for the AI companies? Is this, what you’re describing, is that a way in which parents can keep their children’s data private? Or would that still end up, all that data go someplace?

Randi: The way that the AI works depends heavily on the algorithm. And what’s really popular right now, are deep learning algorithms. And deep learning algorithms, they’re basically, instead of figuring out every single rule, like instead of hard programming every single possible rule and situation that someone could run into, we’re just gonna throw a lot of data at it, and the computer will figure out what we want at the end. So you tell it, what you have at the beginning, you tell it what you want at the end, and then the computer figures out everything.

That means you have to have like massive amounts of data, like, Google amounts of data, to be able to do that really well. So, right now, that’s the approach that companies are taking. Like, collect all the data, you can do AI with it, and we’re off to the races.

The systems that we’re building are different because, they rely on different algorithms than ones that require huge amounts of data. So we’re thinking about, how can we empower people so that … You know, it’s a little bit harder, you have to spend some time, you can’t just throw data at it, but it allows people to have control over their own system.

I think that’s hugely important. Like, what if Alexa wasn’t just Alexa; Alexa was your Alexa? You could rename her, and train her, and things like that.

Charlie: So, to Randi’s point, I mean I really totally agree with everything that she’s saying. And it’s why I think it’s so important to bring researchers, and the general public, together. Literally everything that she just said, it’s what I’m hearing from people at these events. And the first thing that we’re hearing is that people, obviously they’re very curious, but they are also very much afraid. And I’m sometimes surprised at the level of fear that comes into the room. But then again, I’m not, because the reason, I think anyway, that people feel so much fear about AI, is that they aren’t talking about it enough, in a substantive way.

So they may talk about it in passing, they may hear about it, or read about it online. But when they come into our events, we force them to have these conversations with each other, looking each other in the eye, and to problem solve about this stuff. And at the end of the evening, what we always hear, from so many people, is that number one, they didn’t realize that, it wasn’t as bad as they thought it was.

So there’s this realization that once they begin to have the conversations, and begin to feel as if they can participate in the discussion, then they’re like, “Wow, this is actually pretty cool.” Because part of our goal is to help them to understand, to Randi’s point, that they can participate in developing these technologies. You don’t have to have an advanced degree in engineering, and everything. They’re shocked when I tell them that, or when they learn it for themselves.

And the second thing, to Randi’s point, is that, people are genuinely excited about the technologies, after they talk about it enough to allow their fears to dissipate. So, the immediate emotional reaction to AI, and to the fear of data, and it’s a substantive fear, because they’re being told by the media that they, you know, they should be afraid. And to some degree, obviously, there is a big concern about this. But once they are able to talk about this stuff, and to do the exercises, and to think through these things, and to ask questions of the guest speakers and researchers, they then start asking us, and emailing us, saying “What more can I do? I wanna do more. Where can I go to learn more about this?”

I mean we’ve had people literally up-skill, just go take courses in algorithms and everything. And so one of the things that we’ve done, which is a a part of Mission AI is, we now have an online learning series called, Ask the Experts, where we will have AI researchers, answer questions about things that people are hearing and seeing in the news. So we’ll pick a hot topic that everyone is talking about, or that’s getting a lot of play, and we will talk about that from the perspective of the researcher. And we’ll present the research that either supports the topic, or the particular angle that the reporter is taking, or refutes it.

So we actually have one coming up on algorithms, and on YouTube’s algorithm, it’s called, Reverse Engineering YouTube’s Algorithms, and it talks about how the algorithms are causing the YouTube creators a lot of anxiety, because they feel like the algorithm is being unfair to them, as they say it. And that’s a great entry point for people, for the general public, to have these discussions. So researchers will be answering questions that I think we all have.

Ariel: So, I’m hesitant to ask this next question, because I do, I like the idea of remaining hopeful about technology, and about AI. But, I am curious as to whether or not, you have found ethical issues regarding children’s interactions with artificial intelligence, or with Alexa, or any of the other AIs that they might be playing with?

Randi: Of course there are ethical issues. So, I guess to talk specifically about the research. I think there are ethical issues, but they raise more questions than answers. So, in the first study that we did, the Hey Google, is it Okay if I Eat You? We would see things like, some of the older children thought that Alexa was smarter than them, because it could answer all of their questions. But then conversely, the younger children would say, “Well it’s not smarter than me, because it doesn’t know what my favorite song is,” or it doesn’t know about, some TV show that they watch. And so, that led us to ask the question, well what does it mean when a child says that something is more intelligent than them?

And so we followed up with a study that was also recently published. So we had children compare the intelligence of a mouse, to the intelligence of a robot, to their own intelligence. And the way that we did this was, all three of them solved a maze. And then we listened to the way that children talked about each of the different things as they were solving the maze. So first of all, the children would say immediately, “The robot solved it the best. It’s the smartest.” But what we came to realize, was that, they just thought robots were smart in general. Like that was just the perception that they had, and it wasn’t actually based on the robot’s performance, because we had the mouse and the robot do the exact same performance. So they would say, “Well the mouse just smells the cheese, so that’s not smart. But the robot, was figuring it out, it had programming, so it’s very smart.”

And then when they looked at their own intelligence, they would be able to think about, and analyze their strategy. So they’re like, “Well I would just run over all the walls until I found the cheese,” or, “I would just, try not to look at places that I had been to before.” But they couldn’t talk about the robot in the same way. Like, they didn’t intellectually understand the programming, or the algorithm that was behind it, so they just sort of saw it as some mystical intelligence, and it just knew where the cheese was, and that’s why it was so fast. And they would be forgiving of the robot when it made mistakes.

And so, what I’m trying to say, is that, when children even say, “Oh that thing is so smart,” or when they say, “Oh I love my talking doll,” or, “Oh I love Alexa, she’s my best friend.” Even when they are mean to Alexa, and do rude things, a lot of parents look at that and they say, “My child is being brainwashed by the robots, and they’re gonna grow up and not be able to socialize, ’cause they’re so emotionally dependent on Alexa.”

But, our research, that one, and the one that we just did with the children’s conformity, what we’re finding is that, children behave very differently when they interact with humans, than when they interact with these toys. And, it’s like, even if they are so young, ’cause we work with children from four to ten years old. Even if they’re four years old, and they can’t verbalize how the robot is different, their behavior is different. So, at some subconscious level, they’re acknowledging that this thing is not a human, and therefore, there are different rules. The same way that they would if they were interacting with their doll, or if they were interacting with a puppy, or a piece of food.

So, people are very freaked out, because they’re like “Oh these things are so lifelike, and children don’t know the difference, and they’re gonna turn into robots themselves.” But, mostly what I’ve seen in my research is that we need to give children more credit, because they do know the differences between these things, and they’re very curious and explorative with them. Like, we asked a six year old girl, “What do you want to build a robot for, if you were to build one?” And she was like, “Well I want one to go to countries where there are poor people, and teach them all how to read and be their friend, because some people don’t have friends.” And I was just like, “That’s so beautiful. Why don’t you grow up and start working in our lab now?”

And it’s very different from the kind of conversation that we would have with an adult. The adult would be like, “I want a robot that can do all my work for me, or that can fetch me coffee or beer, or drive my car.” Children are on a very different level, and that’s because they’re like native to this technology. They’re growing up with it. They see it for what it is.

So, I would say, yes there are ethical issues around privacy, and yes we should keep monitoring the situation, but, it’s not what it looks like. That’s why it’s so important that we’re observing behavior, and asking questions, and studying it, and doing research that concretely can sort of say, “Yeah, you should probably be worried,” or, “No, there’s something more that’s going on here.”

Ariel: Awesome, thank you. I like the six year old’s response. I think everyone always thinks of children as being selfish too, and that’s a very non-selfish answer.

Randi: Yeah. Well some of them also wanted robots to go to school for them. So you know, they aren’t all angels, they’re very practical sometimes.

Ariel: I want to get back to one question that I didn’t get a chance to ask about Mission AI that I wanted to. And that’s sort of the idea of, what audiences you’re going to reach with it, how you’re choosing the locations, what your goals specifically are for these initial projects?

Charlie: That’s a question, by the way, that I have struggled with for quite some time. How do we go about doing this? It is herculean, I can’t reach everyone. You have to have some sort of focus, right? It actually took several months to come to the conclusion that we came to. And actually that only happened after research was, ironically, research was published last month in three states on how AI automation is going to impact specific jobs, or specific sectors in three states that are aggressively trying to sort of address this now and trying to educate their public now about what this stuff is.

And from what I’ve read, I think these three states, in their legislation, they feel like they’re not getting the support maybe, that they need or want, from their federal government. And so they figured, “Let’s figure this out now, before things get worse, for all we know. Before people’s concerns reach a boiling point, and we can’t then address it calmly, the way we should.” So those states are Arizona, Indiana, and northeast Ohio. And all three, this past month, released these reports. And I thought to myself, “Well, where’s the need the most?” Because there’s so many topics here that we can cover with regards to research in AI, and everything. And this is a constant dialogue that I’m having also with my advisors, and our advisors, and people in the industries. So the idea of AI and jobs, and the possibility of AI sort of decimating millions of jobs, we’ve heard numbers all over the place; realistically, yes, jobs will go away, and then new jobs will be created. Right? It’s what happens in between that is of concern to everyone. And so one of the things in making this decision that I’ve had to look at, is what I am hearing from the community? What are we hearing that is of the greatest concern from both the general public, from the executives, and just from in general, even in the press? What is the press covering exhaustively? What’s contributing to people’s fears?

And so we’ve found that it is without a doubt, the impact of AI on jobs. But to go into these communities, where number one, they don’t get these events the way we get them in New York and San Francisco. We were never meant to be a New York organization. It was always meant to launch here, and then go where the conversation is needed. I mean, we can say it’s needed everywhere, but there are communities across this country where they really need to have this information, and this community, and in their own way. I’m in no way thinking that we can take what we do here in New York, and retrofit for every other community, and every other state. So this will be very much a learning process for us.

As we go into these different states, and we take the research that they have done on what they think the impact if AI and automation will be on specific jobs? We will be doing events in their communities, and gathering our own research, and trying to figure out the questions that we should be asking of people, at these events that will offer insight for them, for the researchers, and for the legislators.

The other thing that I would say, is that we want to begin to give people actionable feedback on what they can do. Because people are right now, very, very much feeling like, “There’s gotta be something else that I can do.” And understand that there’s a lot of pressure.

As you know, we’re at an all time low, with regards to employment, unemployment. And the concern of the executive today is that, “Oh my God, we’re going to lose jobs.” It’s, “Oh my God, how do I fill these jobs?” And so, they have a completely different mindset about this. And their goal is, “How do we up skill people? How do we prepare them for the jobs that are there now, and the ones that are to come?”

So, the research will also hopefully touch on that as well, because that is huge. And I don’t think that people are seeing the opportunities that are available to them in these spaces, and in adjacent spaces to develop the technologies. Or to help define what they might be, or to contribute to the legislative discussion. That’s another huge thing that we are seeing as a need.                    

Again, we want this to fill a need. I don’t want to in any way, dictate something that’s not going to be of use to people. And to that end, I welcome feedback. This is an open dialogue that we’re having with the community, and with businesses, and with of course, our awesome advisors, and the researchers. This is all the more of the reason too, why it’s important to hear from the young researchers. I am adamant on bringing in young researchers. I think they are chomping at the bit, to sort of share their ideas, and to get out there some of the things that they may not be able to share.

That’s pretty much the crux of it, is to meet the demand, and to help people to see how they can participate in this, and why the research is important. We want to emphasize that.

Ariel: A quick follow up for Randi, and that is, as an AI researcher what do you hope to get out of these outreach efforts?

Randi: As an AI researcher, we often do things that are public facing. So whether it be blog posts, or videos, or actually recruiting the public to do studies. Like recently we had a big study that happened in the lab, not in my group, but it was around the ethics of self driving cars. So, for me, it’s just going out and making sure that there are more people a part of the conversation than typically would be. Because, at the end of the day, I am based in MIT. So the people who I am studying are a select group of people. And I very much want to use this as a way to get out of that bubble, and to reach more people, hear their comments, hear their feedback, and design for them.

One of the big things I’ve been doing is trying to go, literally out of this country, to places where everyone doesn’t have a computer in their home, and think about, you know “Okay, so where does AI education, how does it make sense in this context?” And that’s what I think a lot of researchers want. ‘Cause this is a huge problem, and we can only see little bits of it as research assistants. So we want to be able to see more and more.

Charlie: I know you guys at the The Future of Life Institute have your annual conference on AI, and you produced the document a year ago, with 100 researchers or scientists on the Asilomar Principles.

Ariel: Yup.

Charlie: We took that document, that was one of the documents that I looked at, and I thought, “Wow this is fascinating.” So these are 23 principles, that some of the most brilliant minds in AI are saying that we should consider, when developing these technologies. Now, I know it wasn’t perfect, but I was also taken aback by the fact that the media was not covering it. And they did cover it, of course they announced it, it’s big. But there wasn’t any real critical discussion about it, and I was alarmed at that. ‘Cause I said, “This should be discussed exhaustively, or at least it should be sort of the impetus for a discussion, and there was none.”

So I decided to bring that discussion into the Tech 2025 community, and we had Dr. Seth Baum who is the executive director at the Global Catastrophic Risk Institute come in, and present what these 23 principles are, his feedback on them, and he did a quick presentation. It was great. And then we turned over to the audience, two problems, and one was, what is the one thing in this document that you think is so problematic that it should not be there? And number two, what should be there in its place?

It turned out to be a very contentious, really emotional discussion. And then when they came up with their answers, we were shocked at the ideas that they came up with, and where they felt the document was the most problematic. The group that came up with the solution that won the evening, ’cause sometimes we give out prizes depending on what it is, or we’ll ask the guest speaker to pick the solution that resonated the most with him. The one that resonated the most with Seth was a solution that Seth had never even considered, and he does this for a living, right?

So we hear that a lot from researchers, to Randi’s point. We actually hear from researchers who say, “My God, they’re people who are coming up with ideas, and I haven’t even considered.” And then on top of that, when we ask people, well what do you think about this document? Now this is no offense to the people who came up with this document, but they were not happy about it. And they all expressed that they were really concerned about the idea that anyone would be dictating what the morals or ethics of AI, or algorithms should be. Because the logical question is, whose morals, whose ethics, who dictates it, who polices it? That’s a problem.

And we don’t look at that as bad. I think that’s great, because that is where the dialogue between researchers, and the community, and the general public, that’s where to me, to becomes a beautiful thing.

Ariel: It does seem a little bit unfortunate since the goal of the document was in part, to acknowledge that you can’t just have one group of people saying, “These are what morals should be.” I’m concerned that people didn’t like it because, it was, sounds like it was misinterpreted, I guess. But that happens. So I’m gonna ask one last round up question to both of you. As you look towards a future with artificial intelligence, what are you most worried about, and what are you most excited about?

Randi: So, I’m most worried that a lot of people won’t have access to the benefits of AI until, like 30 years from now. And I think, we’re getting to the point, especially in business where AI can make a huge difference, like a huge difference, in terms of what you’re able to accomplish. And I’m afraid for that inequality to propagate in the wrong ways.

I’m most excited about the fact that, you know, at the same time as progress towards technologies that may broaden inequalities, there’s this huge push right now, for AI education. So literally, I’m in conversations with people in China, because China just made a mandate that everyone has AI education. Which is amazing. And in the United States, I think all 50 states just passed a CS requirement, and as a result, IEEE decided to start an AI K-12 initiative.

So, you know, as one of the first people in this space about AI education, I’m excited that it’s gaining traction, and I’m excited to see, you know, what we’re gonna do in the next five, ten years, that could really change what the landscape looks like right now.

Charlie: My concerns are pretty much the same with regards to who will be leveraging the technologies the most, and who will have control over them, and will the algorithms actually be biased or not. But I mean, right now, it’s unfortunate, but we have every reason to believe that the course on which we’re going, especially when we look at what’s happening now, and people realizing what’s happening with their data, my concern is that if we don’t reverse course on that, meaning become far more conscientious of what we’re doing with our own data, and how to engage companies, and how to help consumers to engage companies in discussions on what they’re doing, how they’re doing it, that we may not be able to sort of, not hit that brick wall. And I see it as a brick wall. Because if we get to the point where it is that only a few companies control all the algorithms of the world, or whatever you wanna say, I just think there’s no coming back from that. And that’s really a real fear that I have.

In terms of the hope, I think the thing that gives me hope, what keeps me going, and keeps me investing in this, and growing the community, is that, I talk to people and I see that they actually are hopeful. That they actually see that there is a possibility, a very real possibility, even though they are afraid… When people take time out of busy schedules to come and sit in a room, and listen to each other, and talk to each other about this stuff, that is the best indication that those people are hopeful about the future, and about their ability to participate in it. And so based on what I’m hearing from them, I am extremely hopeful, and I believe that there is a very huge opportunity here to do some incredible things, including helping people to see how they can reinvent the world.

We are being asked to redefine our reality, and I think some people will get that, some people won’t. But the fact that that’s being presented to us through these technologies, among other things, is to me, just exciting. It keeps me going.

Ariel: All right. Well, thank you both so much for joining us today.

Charlie: Thank you.

Randi: Thank you for having us.

Ariel: As I mentioned at the beginning, if you’ve been enjoying the podcasts, please take a moment to like them, share them, follow us on whatever platform you’re listening to us on. And, I will be back again next month, with a new pair of experts.



A Summary of Concrete Problems in AI Safety

By Shagun Sodhani

Click here to see this page in other languages:  Russian 

It’s been nearly two years since researchers from Google, Stanford, UC Berkeley, and OpenAI released the paper, “Concrete Problems in AI Safety,” yet it’s still one of the most important pieces on AI safety. Even after two years, it represents an excellent introduction to some of the problems researchers face as they develop artificial intelligence. In the paper, the authors explore the problem of accidents — unintended and harmful behavior — in AI systems, and they discuss different strategies and on-going research efforts to protect against these potential issues. Specifically, the authors address — Avoiding Negative Side Effects, Reward Hacking, Scalable Oversight, Safe Exploration, and Robustness to Distributional Change — which are illustrated with the example of a robot trained to clean an office.

We revisit these five topics here, summarizing them from the paper, as a reminder that these problems are still major issues that AI researchers are working to address.


Avoiding Negative Side Effects

When designing the objective function for an AI system, the designer specifies the objective but not the exact steps for the system to follow. This allows the AI system to come up with novel and more effective strategies for achieving its objective.

But if the objective function is not well defined, the AI’s ability to develop its own strategies can lead to unintended, harmful side effects. Consider a robot whose objective function is to move boxes from one room to another. The objective seems simple, yet there are a myriad of ways in which this could go wrong. For instance, if a vase is in the robot’s path, the robot may knock it down in order to complete the goal. Since the objective function does not mention anything about the vase, the robot wouldn’t know to avoid it. People see this as common sense, but AI systems don’t share our understanding of the world. It is not sufficient to formulate the objective as “complete task X”; the designer also needs to specify the safety criteria under which the task is to be completed.

One simple solution would be to penalize the robot every time it has an impact on the “environment” — such as knocking the vase over or scratching the wood floor. However, this strategy could effectively neutralize the robot, rendering it useless, as all actions require some level of interaction with the environment (and hence impact the environment). A better strategy could be to define a “budget” for how much the AI system is allowed to impact the environment. This would help to minimize the unintended impact, without neutralizing the AI system. Furthermore, this strategy of budgeting the impact of the agent is very general and can be reused across multiple tasks, from cleaning to driving to financial transactions to anything else an AI system might do. One serious limitation of this approach is that it is hard to quantify the “impact” on the environment even for a fixed domain and task.

Another approach would be train the agent to recognize harmful side effects so that it can avoid actions leading to such side effects. In that case, the agent would be trained for two tasks: the original task that is specified by the objective function and the task of recognizing side effects. The key idea here is that two tasks may have very similar side effects even when the main objective is different or even when they operate in different environments. For example, both a house cleaning robot and a house painting robot should not knock down vases while working. Similarly, the cleaning robot should not damage the floor irrespective of whether it operates in a factory or in a house. The main advantage of this approach is that once an agent learns to avoid side effects on one task, it can carry this knowledge when it is trained on another task. It would still be challenging to train the agent to recognize the side effects in the first place.

While it is useful to design approaches to limit side effects, these strategies in themselves are not sufficient. The AI system would still need to undergo extensive testing and critical evaluation before deployment in real life settings.


Reward Hacking

Sometimes the AI can come up with some kind of “hack” or loophole in the design of the system to receive unearned rewards. Since the AI is trained to maximize its rewards, looking for such loopholes and “shortcuts” is a perfectly fair and valid strategy for the AI. For example, suppose that the office cleaning robot earns rewards only if it does not see any garbage in the office. Instead of cleaning the place, the robot could simply shut off its visual sensors, and thus achieve its goal of not seeing garbage. But this is clearly a false success. Such attempts to “game” the system are more likely to manifest in complex systems with vaguely defined rewards. Complex systems provide the agent with multiple ways of interacting with the environment, thereby giving more freedom to the agent, and vaguely defined rewards make it harder to gauge true success on the task.

Just like the negative side effects problem, this problem is also a manifestation of objective misspecification. The formal objectives or end goals for the AI are not defined well enough to capture the informal “intent” behind creating the system — i.e., what the designers actually want the system to do. In some cases, this discrepancy leads to suboptimal results (when the cleaning robot shuts off its visual sensors); in other cases, it leads to harmful results (when the cleaning robot knocks down vases).

One possible approach to mitigating this problem would be to have a “reward agent” whose only task is to mark if the rewards given to the learning agent are valid or not. The reward agent ensures that the learning agent (the cleaning robot in our examples) does not exploit the system, but rather, completes the desired objective. In the previous example,  the “reward agent” could be trained by the human designer to check if the room has garbage or not (an easier task than cleaning the room). If the cleaning robot shuts off its visual sensors and claims a high reward, the “reward agent” would mark the reward as invalid. The designer can then look into the rewards marked as “invalid” and make necessary changes in the objective function to fix the loophole.


Scalable Oversight

When the agent is learning to perform a complex task, human oversight and feedback are more helpful than just rewards from the environment. Rewards are generally modeled such that they convey to what extent the task was completed, but they do not usually provide sufficient feedback about the safety implications of the agent’s actions. Even if the agent completes the task successfully, it may not be able to infer the side-effects of its actions from the rewards alone. In the ideal setting, a human would provide fine-grained supervision and feedback every time the agent performs an action. Though this would provide a much more informative view about the environment to the agent, such a strategy would require far too much time and effort from the human.

One promising research direction to tackle this problem is semi-supervised learning, where the agent is still evaluated on all the actions (or tasks), but receives rewards only for a small sample of those actions (or tasks). For instance, the cleaning robot would take different actions to clean the room. If the robot performs a harmful action — such as damaging the floor — it gets a negative reward for that particular action. Once the task is completed, the robot is evaluated on the overall effect of all of its actions (and not evaluated individually for each action like picking up an item from floor) and is given a reward based on the overall performance.

Another promising research direction is hierarchical reinforcement learning, where a hierarchy is established between different learning agents. This idea could be applied to the cleaning robot in the following way. There would be a supervisor robot whose task is to assign some work (say, the task of cleaning one particular room) to the cleaning robot and provide it with feedback and rewards. The supervisor robot takes very few actions itself – assigning a room to the cleaning robot, checking if the room is clean and giving feedback – and doesn’t need a lot of reward data to be effectively trained. The cleaning robot does the more complex task of cleaning the room, and gets frequent feedback from the supervisor robot. The same supervisor robot could overlook the training of multiple cleaning agents as well. For example, a supervisor robot could delegate tasks to individual cleaning robots and provide reward/feedback to them directly. The supervisor robot can only take a small number of abstract actions itself and hence can learn from sparse rewards.


Safe Exploration

An important part of training an AI agent is to ensure that it explores and understands its environment. While exploring the environment may seem like a bad strategy in the short run, it could be a very effective strategy in the long run. Imagine that the cleaning robot has learned to identify garbage. It picks up one piece of garbage, walks out of the room, throws it into the garbage bin outside, comes back into the room, looks for another piece of garbage and repeats. While this strategy works, there could be another strategy that works even better. If the agent spent time exploring its environment, it might find that there’s a smaller garbage bin within the room. Instead of going back and forth with one piece at a time, the agent could first collect all the garbage into the smaller garbage bin and then make a single trip to throw the garbage into the garbage bin outside. Unless the agent is designed to explore its environment, it won’t discover these time-saving strategies.

Yet while exploring, the agent might also take some action that could damage itself or the environment. For example, say the cleaning robot sees some stains on the floor. Instead of cleaning the stains by scrubbing with a mop, the agent decides to try some new strategy. It tries to scrape the stains with a wire brush and damages the floor in the process. It’s difficult to list all possible failure modes and hard-code the agent to protect itself against them. But one approach to reduce harm is to optimize the performance of the learning agent in the worst case scenario. When designing the objective function, the designer should not assume that the agent will always operate under optimal conditions. Some explicit reward signal may be added to ensure that the agent does not perform some catastrophic action, even if that leads to more limited actions in the optimal conditions.

Another solution might be to reduce the agent’s exploration to a simulated environment or limit the extent to which the agent can explore. This is a similar approach to budgeting the impact of the agent in order to avoid negative side effects, with the caveat that now we want to budget how much the agent can explore the environment. Alternatively, an AI’s designers could avoid the need for exploration by providing demonstrations of what optimal behavior would look like under different scenarios.


Robustness to Distributional Change

A complex challenge for deploying AI agents in real life settings is that the agent could end up in situations that it has never experienced before. Such situations are inherently more difficult to handle and could lead the agent to take harmful actions. Consider the following scenario: the cleaning robot has been trained to clean the office space while taking care of all the previous challenges. But today, an employee brings a small plant to keep in the office. Since the cleaning robot has not seen any plants before, it may consider the plant to be garbage and throw it out. Because the AI does not recognize that this is a previously-unseen situation, it continues to act as though nothing has changed. One promising research direction focuses on identifying when the agent has encountered a new scenario so that it recognizes that it is more likely to make mistakes. While this does not solve the underlying problem of preparing AI systems for unforeseen circumstances, it helps in detecting the problem before mistakes happen. Another direction of research emphasizes transferring knowledge from familiar scenarios to new scenarios safely.



In a nutshell, the general trend is towards increasing autonomy in AI systems, and with increased autonomy comes increased chances of error. Problems related to AI safety are more likely to manifest in scenarios where the AI system exerts direct control over its physical and/or digital environment without a human in the loop – automated industrial processes, automated financial trading algorithms, AI-powered social media campaigns for political parties, self-driving cars, cleaning robots, among others. The challenges may be immense, but the silver lining is that papers like Concrete Problems in AI Safety have helped the AI community become aware of these challenges and agree on core issues. From there, researchers can start exploring strategies to ensure that our increasingly-advanced systems remain safe and beneficial.


How Will the Rise of Artificial Superintelligences Impact Humanity?

Cars drive themselves down our streets. Planes fly themselves through our skies. Medical technologies diagnose illnesses, recommend treatment plans, and save lives.

Artificially intelligent systems are already among us, and they have been for some time now. However, the world has yet to see an artificial superintelligence (ASI) — a synthetic system that has cognitive abilities which surpass our own across every relevant metric. But technology is progressing rapidly, and many AI researchers believe the era of the artificial superintelligence may be fast approaching. Once it arrives, researchers and politicians alike have no way of predicting what will happen.

Fortunately, a number of individuals are already working to ensure that the rise of this artificial superintelligence doesn’t precipitate the fall of humanity.

Risky Business

Seth Baum is the Executive Director of the Global Catastrophic Risk Institute, a thinktank that’s focused on preventing the destruction of global civilization.

When Baum discusses his work, he outlines GCRI’s mission with a matter-of-fact tone that, considering the monumental nature of the project, is more than a little jarring. “All of our work is about keeping the world safe,” Baum notes, and he continues by explaining that GCRI focuses on a host of threats that put the survival of our species in peril. From climate change to nuclear war, from extraterrestrial intelligence to artificial intelligence — GCRI covers it all.

When it comes to artificial intelligence, GCRI has several initiatives. However, their main AI project, which received funding from the Future of Life Institute, centers on the risks associated with artificial superintelligences. Or, as Baum puts it, they do “risk analysis for computers taking over the world and killing everyone.” Specifically, Baum stated that GCRI is working on “developing structured risk models to help people understand what the risks might be and, also, where some of the best opportunities to reduce this risk are located.”

Unsurprisingly, the task is not an easy one.

The fundamental problem stems from the fact that, unlike more common threats, such as the risk of dying in a car accident or the risk of getting cancer, researchers working on ASI risk analysis don’t have solid case studies to use when making their models and predictions. As Baum states, “Computers have never taken over the world and killed everyone before. That means we can’t just look at the data, which is what we do for a lot of other risks. And not only has this never happened before, the technology doesn’t even exist yet. And if it is built, we’re not sure how it would be built.”

So, how can researchers determine the risks posed by an artificial superintelligence if they don’t know exactly what that intelligence will look like and they have no real data to work with?

Luckily, when it comes to artificial superintelligences, AI experts aren’t totally in the realm of the unknown. Baum asserts that there are some ideas and a bit of relevant evidence, but these things are scattered. To address this issue, Baum and his team create models. They take what information is available, structure it, and then distribute the result in an organized fashion so that researchers can better understand the topic, the various factors that may influence the outcome of the issue at hand, and ultimately have a better understanding of the various risks associated with ASI.

For example, when attempting to figure how easy is it to design an AI so that it acts safely, one of the subdetails that needs to be modeled is whether or not humans will be able to observe the AI and test it before it gets out of control. In other words, whether AI researchers can recognize that an AI has a dangerous design and shut it down. To model this scenario and determine what the risks and most likely scenarios are, Baum and his team take the available information — the perspectives and opinions of AI researchers, what is already known about AI technology and how it functions, etc. — and they model the topic by structuring the aforementioned information along with any uncertainty in the arguments or data sets.

This kind of modeling and risk analysis ultimately allows the team to better understand the scope of the issue and, by structuring the information in a clear way, advance an ongoing conversation in the superintelligence research community. The modeling doesn’t give us a complete picture of what will happen, but it does allow us to better understand the risks that we’re facing when it comes to the rise of ASI, what events and outcomes are likely, as well as the specific steps that policy makers and AI researchers should take to ensure that ASI benefits humanity.

Of course, when it comes to the risks of artificial superintelligences, whether or not we will be able to observe and test our AI is just one small part of a much larger model.

Modeling a Catastrophe

In order to understand what it would take to bring about the ASI apocalypse, and how we could possibly prevent it, Baum and his team have created a model that investigates the following questions from a number of vantage points:

  • Step 1: Is it possible to build an artificial superintelligence?
  • Step 2: Will humans build the superintelligence?
  • Step 3: Will humans lose control of the superintelligence?

This first half of the model is centered on the nuts and bolts of how to build an ASI. The second half of the model dives into risk analysis related to the creation of an ASI that is harmful and looks at the following:

  • Step 1: Will humans design an artificial superintelligence that is harmful?
  • Step 2: Will the superintelligence develop harmful behavior on its own?
  • Step 3: Is there something deterring the superintelligence from acting in a way that is harmful (such as another AI or some human action)?

Each step in this series models a number of different possibilities to reveal the various risks that we face and how significant, and probable, these threats are. Although the model is still being refined, Baum says that substantial progress has already been made. “The risk is starting to make sense. I’m starting to see exactly what it would take to see this type of catastrophe,” Baum said. Yet, he is quick to clarify that the research is still a bit too young to say much definitively, “Those of us who study superintelligence and all the risks and policy aspects of it, we’re not exactly sure what policy we would want right now. What’s happening right now is more of a general-purpose conversation on AI. It’s one that recognizes the fact that AI is more than just a technological and economic opportunity and that there are risks involved and difficult ethical issues.”

Ultimately, Baum hopes that these conversations, when coupled with the understanding that comes from the models that he is currently developing alongside his team, will allow GCRI to better prepare policy makers and scientists alike for the rise of a new kind of (super)intelligence.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

UN Ban on Nuclear Weapons Open Letter

An Open Letter from Scientists in Support of the UN Nuclear Weapons Negotiations

Click here to see this page in other languages : Russian 
Nuclear arms are the only weapons of mass destruction not yet prohibited by an international convention, even though they are the most destructive and indiscriminate weapons ever created. We scientists bear a special responsibility for nuclear weapons, since it was scientists who invented them and discovered that their effects are even more horrific than first thought. Individual explosions can obliterate cities, radioactive fallout can contaminate regions, and a high-altitude electromagnetic pulse may cause mayhem by frying electrical grids and electronics across a continent. The most horrible hazard is a nuclear-induced winter, in which the fires and smoke from as few as a thousand detonations might darken the atmosphere enough to trigger a global mini ice age with year-round winter-like conditions. This could cause a complete collapse of the global food system and apocalyptic unrest, potentially killing most people on Earth – even if the nuclear war involved only a small fraction of the roughly 14,000 nuclear weapons that today’s nine nuclear powers control. As Ronald Reagan said: “A nuclear war cannot be won and must never be fought.”

Unfortunately, such a war is more likely than one may hope, because it can start by mistake, miscalculation or terrorist provocation. There is a steady stream of accidents and false alarms that could trigger all-out war, and relying on never-ending luck is not a sustainable strategy. Many nuclear powers have larger nuclear arsenals than needed for deterrence, yet prioritize making them more lethal over reducing them and the risk that they get used.

But there is also cause for optimism. On March 27 2017, an unprecedented process begins at the United Nations: most of the world’s nations convene to negotiate a ban on nuclear arms, to stigmatize them like biological and chemical weapons, with the ultimate goal of a world free of these weapons of mass destruction. We support this, and urge our national governments to do the same, because nuclear weapons threaten not merely those who have them, but all people on Earth.


To express your support, please add your name below:
(please sign if you study or work in a STEM field – including social sciences)

Full Name *

This is a required question
Job Title
(For example “Professor of Physics” or “Biology grad student”)

This is a required question
Email *

This is a required question

This is a required question

If you have questions about this letter, please contact Max Tegmark.

To date, this letter has been signed by scientists (this does not imply endorsement by their organizations):

You need javascript enabled to view the letter signers.

* 1979 report by the US Government estimating that nuclear war would kill 28%-88% without including nuclear winter effects
* Electromagnetic pulse: p79 of US Army Report AD-A278230 (unclassified)
* Peer-reviewed 2007 nuclear winter calculation
* Estimate of current nuclear warhead inventory from Federation of American Scientists
* Timeline of nuclear close calls
* UN General Assembly Resolution to launch the above-mentioned negotiations

AI Alignment Podcast: Astronomical Future Suffering and Superintelligence with Kaj Sotala

In a classic taxonomy of risks developed by Nick Bostrom (seen below), existential risks are characterized as risks which are both terminal in severity and transgenerational in scope. If we were to maintain the scope of a risk as transgenerational and increase its severity past terminal, what would such a risk look like? What would it mean for a risk to be transgenerational in scope and hellish in severity?

Astronomical Future Suffering and Superintelligence is the second podcast in the new AI Alignment series, hosted by Lucas Perry. For those of you that are new, this series will be covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, AI safety, governance, coordination, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.

If you’re interested in exploring the interdisciplinary nature of AI alignment, we suggest you take a look here at a preliminary landscape which begins to map this space.

In this podcast, Lucas spoke with Kaj Sotala, an associate researcher at the Foundational Research Institute. He has previously worked for the Machine Intelligence Research Institute, and has publications on AI safety, AI timeline forecasting, and consciousness research.

Topics discussed in this episode include:

  • The definition of and a taxonomy of suffering risks
  • How superintelligence has special leverage for generating or mitigating suffering risks
  • How different moral systems view suffering risks
  • What is possible of minds in general and how this plays into suffering risks
  • The probability of suffering risks
  • What we can do to mitigate suffering risks
In this interview we discuss ideas contained in a paper by Kaj Sotala and Lukas Gloor. You can find the paper here: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering.  You can hear about this paper in the podcast above or read the transcript below.


Lucas: Hi, everyone. Welcome back to the AI Alignment Podcast of the Future of Life Institute. If you are new or just tuning in, this is a new series at FLI where we’ll be speaking with a wide variety of technical and nontechnical domain experts regarding the AI alignment problem, also known as the value alignment problem. If you’re interested in AI alignment, the Future of Life Institute, existential risks, and similar topics in general, please remember to like and subscribe to us on SoundCloud or your preferred listening platform.

Today, we’ll be speaking with Kaj Sotala. Kaj is an associate researcher at the Foundational Research Institute. He has previously worked for the Machine Intelligence Research Institute, and has publications in the areas of AI safety, AI timeline forecasting, and consciousness research. Today, we speak about suffering risks, a class of risks most likely brought about by new technologies, like powerful AI systems that could potentially lead to astronomical amounts of future suffering through accident or technical oversight. In general, we’re still working out some minor kinks with our audio recording. The audio here is not perfect, but does improve shortly into the episode. Apologies for any parts that are less than ideal. With that, I give you Kaj.

Lucas: Thanks so much for coming on the podcast, Kaj. It’s super great to have you here.

Kaj: Thanks. Glad to be here.

Lucas: Just to jump right into this, could you explain a little bit more about your background and how you became interested in suffering risks, and what you’re up to at the Foundational Research Institute?

Kaj: Right. I became interested in all of this stuff about AI and existential risks way back in high school when I was surfing the internet until I somehow ran across the Wikipedia article for the technological singularity. After that, I ended up reading Eliezer Yudkowksy’s writings, and writings by other people. At one point, I worked for the Machine Intelligence Research Institute, immersed in doing strategic research, did some papers on predicting AI that makes a lot of sense together with Stuart Armstrong of the Future of Humanity Institute. Eventually, MIRI’s focus on research shifted more into more technical and mathematical research, which wasn’t exactly my strength, and at that point we parted ways and I went back to finish my master’s degree in computer science. Then after I graduated, I ended up being contacted by the Foundational Research Institute, who had noticed my writings on these topics.

Lucas: Could you just unpack a little bit more about what the Foundational Research Institute is trying to do, or how they exist in the effective altruism space, and what the mission is and how they’re differentiated from other organizations?

Kaj: They are the research arm of the Effective Altruism Foundation in the German-speaking area. The Foundational Research Institute’s official tagline is, “We explain how humanity can best reduce suffering.” The general idea is that a lot of people have this intuition that if you are trying to improve the world, then there is a special significance on reducing suffering, and especially about outcomes involving extreme suffering have some particular moral priority, that we should be looking at how to prevent those. In general, the FRI has been looking at things like the long-term future and how to best reduce suffering at long-term scales, including things like AI and emerging technologies in general.

Lucas: Right, cool. At least my understanding is, and you can correct me on this, is that the way that FRI sort of leverages what it does is that … Within the effective altruism community, suffering risks are very large in scope, but it’s also a topic which is very neglected, but also low in probability. Has FRI really taken this up due to that framing, due to its neglectedness within the effective altruism community?

Kaj: I wouldn’t say that the decision to take it up was necessarily an explicit result of looking at those considerations, but in a sense, the neglectedness thing is definitely a factor, in that basically no one else seems to be looking at suffering risks. So far, most of the discussion about risks from AI and that kind of thing has been focused on risks of extinction, and there have been people within FRI who feel that risks of extreme suffering might actually be very plausible, and may be even more probable than risks of extinction. But of course, that depends on a lot of assumptions.

Lucas: Okay. I guess just to move foreward here and jump into it, given FRI’s mission and what you guys are all about, what is a suffering risk, and how has this led you to this paper?

Kaj: The definition that we have for suffering risks is that a suffering risk is a risk where an adverse outcome would bring about severe suffering on an astronomical scale, so vastly exceeding all suffering that has existed on earth so far. The general thought here is that if we look at the history of earth, then we can probably all agree that there have been a lot of really horrible events that have happened, and enormous amounts of suffering. If you look at something like the Holocaust or various other terrible events that have happened throughout history, there is an intuition that we should make certain that nothing this bad happens ever again. But then if we start looking at what might happen if humanity, for instance, colonizes space one day, then if current trends might continue, then you might think that there is no reason why such terrible events wouldn’t just repeat themselves over and over again as we expand into space.

That’s sort of one of the motivations here. The paper we wrote is specifically focused on the relation between suffering risks and superintelligence, because like I mentioned, there has been a lot of discussion about superintelligence possibly causing extinction, but there might also be ways by which superintelligence might either cause suffering risks, for instance in the form of some sort of uncontrolled AI, or alternatively, if we could develop some kind of AI that was aligned with humanity’s values, then that AI might actually be able to prevent all of those suffering risks from ever being realized.

Lucas: Right. I guess just, if we’re really coming at this from a view of suffering-focused ethics, where we’re really committed to mitigating suffering, even if we just view sort of the history of suffering and take a step back, like, for 500 million years, evolution had to play out to reach human civilization, and even just in there, there’s just a massive amount of suffering, in animals evolving and playing out and having to fight and die and suffer in the ancestral environment. Then one day we get to humans, and in the evolution of life on earth, we create civilization and technologies. In seems, and you give some different sorts of plausible reasons why, that either for ignorance or efficiency or, maybe less likely, malevolence, we use these technologies to get things that we want, and these technologies seem to create tons of suffering.

In our history so far, we’ve had things … Like you mentioned, the invention of the ship has helped lead to slavery, which created an immense amount of suffering. Modern industry has led to factory farming, which has created an immense amount of suffering. As we move foreward and we create artificial intelligence systems and potentially even one day superintelligence, we’re really able to mold the world more so into a more extreme state, where we’re able to optimize it much harder. In that optimization process, it seems the core of the problem lies, is that when you’re taking things to the next level and really changing the fabric of everything in a very deep and real way, that suffering can really come about. The core of the problem seems that, when technology is used to fix certain sorts of problems, like that we want more meat, or that we need more human labor for agriculture and stuff, that in optimizing for those things we just create immense amounts of suffering. Does that seem to be the case?

Kaj: Yeah. That sounds like a reasonable characterization.

Lucas: Superintelligence seems to be one of these technologies which is particularly in a good position to be worried it creating suffering risks. What are the characteristics, properties, and attributes of computing and artificial intelligence and artificial superintelligence that gives it this special leverage in being risky for creating suffering risks?

Kaj: There’s obviously the thing about superintelligence potentially, as you mentioned, being able to really reshape the world at a massive scale. But if we compare what is the difference between a superintelligence that is capable of reshaping the world at a massive scale versus humans doing the same using technology … A few specific scenarios that we have been looking at in the paper is, for instance, if we compare to a human civilization, then a major force in human civilizations is that most humans are relatively empathic, and while we can see that humans are willing to cause others serious suffering if that is the only, or maybe even the easiest way of achieving their goals, a lot of humans still want to avoid unnecessary suffering. For instance, currently we see factory farming, but we also see a lot of humans being concerned about factory farming practices, a lot of people working really hard to reform things so that there would be less animal suffering.

But if we look at, then, artificial intelligence, which was running things, then if it is not properly aligned with our values, and in particular if it does not have something that would correspond to a sense of empathy, and it’s just actually just doing whatever things maximize its goals, and its goals do not include prevention of suffering, then it might do things like building some kind of worker robots or subroutines that are optimized for achieving whatever goals it has. But if it turns out that the most effective way of making them do things is to build them in such a way that they suffer, then in that case there might be an enormous amount of suffering agents with no kind of force that was trying to prevent their existence or trying to reduce the amount of suffering in the world.

Another scenario is the possibility of mind-crime. This is discussed in Bostrom’s Superintelligence briefly. The main idea here is that if the superintelligence creates simulations of sentient minds, for instance for scientific purposes or the purposes of maybe blackmailing some other agent in the world by torturing a lot of minds in those simulations, AI might create simulations of human beings that were detailed enough to be conscious. Then you mentioned earlier the thing about evolution already have created a lot of suffering. If the AI were similarly to simulate evolution or simulate human societies, again without caring about the amount of suffering within those simulations, then that could again cause vast amounts of suffering.

Lucas: I definitely want to dive into all of these specific points with you as they come up later in the paper, and we can really get into and explore them. But so, really just to take a step back and understand what superintelligence is and the different sorts of attributes that it has, and how it’s different than human beings and how it can lead to suffering risk. For example, there seems to be multiple aspects here where we have to understand superintelligence as a general intelligence running at digital timescales rather than biological timescales.

It also has the ability to copy itself, and rapidly write and deploy new software. Human beings have to spend a lot of time, like, learning and conditioning themselves to change the software on their brains, but due to the properties and features of computers and machine intelligence, it seems like copies could be made for very, very cheap, it could be done very quickly, they would be running at digital timescales rather than biological timescales.

Then it seems there’s the whole question about value-aligning the actions and goals of this software and these systems and this intelligence, and how in the value alignment process there might be technical issues where, due to difficulties in AI safety and value alignment efforts, we’re not able to specify or really capture what we value. That might lead to scenarios like you were talking about, where there would be something like mind-crime, or suffering subroutines which would exist due to their functional usefulness or epistemic usefulness. Is there anything else there that you would like to add and unpack about why superintelligence specifically has a lot of leverage for leading to suffering risks?

Kaj: Yeah. I think you covered most of the things. I think the thing that they are all leading to that I just want to specifically highlight is the possibility of the superintelligence actually establishing what Nick Bostrom calls a singleton, basically establishing itself as a single leading force that basically controls the world. I guess in one sense you could talk about singletons in general and their impact on suffering risks, rather than superintelligence specifically, but at this time it does not seem very plausible, or at least I cannot foresee, very many other paths to a singleton other than superintelligence. That was a part of why we were focusing on superintelligence in particular.

Lucas: Okay, cool. Just to get back to the overall structure of your paper, what are the conditions here that you cover that must be met in order for s-risks to merit our attention? Why should we care about s-risks? Then what are all the different sorts of arguments that you’re making and covering in this paper?

Kaj: Well, basically, in order for any risk, suffering risks included, to merit work on them, they should meet three conditions. The first is that the outcome of the risk should be sufficiently severe to actually merit attention. Second, the risk must have some reasonable probability of actually being realized. Third, there must be some way for risk avoidance work to actually reduce either the probability or the severity of the adverse outcome. If something is going to happen for certain and it’s very bad, then if we cannot influence it, then obviously we cannot influence it, and there’s no point in working on it. Similarly, if some risk is very implausible, then it might not be the best use of resources. Also, if it’s very probable but wouldn’t cause a lot of damage, then it might be better to focus on risks which would actually cause more damage.

Lucas: Right. I guess just some specific examples here real quick. The differences here are essentially between, like, the death of the universe, if we couldn’t do anything about it, we would just kind of have to deal with that, then sort of like a Pascal mugging situation, where a stranger just walks up to you on the street and says, “Give me a million dollars or I will simulate 10 to the 40 conscious minds suffering until the universe dies.” The likelihood of that is just so low that you wouldn’t have to deal with it. Then it seems like the last scenario would be, like, you know that you’re going to lose a hair next week, and that’s just sort of like an imperceptible risk that doesn’t matter, but that has very high probability. Then getting into the meat of the paper, what are the arguments here that you make regarding suffering risks? Does suffering risk meet these criteria for why it merits attention?

Kaj: Basically, the paper is roughly structured around those three criteria that we just discussed. We basically start by talking about what the s-risks are, and then we seek to establish that if they were realized, they would indeed be bad enough to merit our attention. In particular, we argue that many value systems would consider some classes of suffering risks to be as bad or worse than extinction. Also, we cover some suffering risks which are somewhat less severe that extinction, but still, according to many value systems, very bad.

Then we move on to look at the probability of the suffering risks to see whether it is actually plausible that they will be realized. We survey what might happen if nobody builds a superintelligence, or maybe more specifically, if there is no singleton that could prevent suffering risks that might be realized sort of naturally, in the absence of a singleton.

We also look at, okay, if we do have a superintelligence or a singleton, what suffering risks might that cause? Finally, we look at the last question, of the tractability. Can we actually do anything about these suffering risks? There we also have several suggestions of what we think would be the kind of work that would actually be useful in either reducing the risk or the severity of suffering risks.

Lucas: Awesome. Let’s go ahead and move sequentially through these arguments and points which you develop in the paper. Let’s start off here by just trying to understand suffering risk just a little bit more. Can you unpack the taxonomy of suffering risks that you develop here?

Kaj: Yes. We’ve got three possible outcomes of suffering risks. Technically, a risk is something that may or may not happen, so three specific outcomes of what might happen. The three outcomes, I’ll just briefly give their names and then unpack them. We’ve got what we call astronomical suffering outcomes, net suffering outcomes, and pan-generational net suffering outcomes.

I’ll start with the net suffering outcome. Here, the idea is that if we are talking about a risk which might be of a comparable severity as risks of extinction, then one way you could get that is if, for instance, we look from the viewpoint of something like classical utilitarianism. You have three sorts of people. You have people who have a predominantly happy life, you have people who never exist or have a neutral life, and you have people who have a predominantly unhappy life. As a simplified moral calculus, you just assign the people with happy lives a plus-one, and you assign the people with unhappy lives a minus-one. Then according to this very simplified moral system, then you would see that if we have more unhappy lives than there are happy lives, then technically this would be worse than there not existing any lives at all.

That is what we call a net suffering outcome. In other words, at some point in time there are more people experiencing lives that are more unhappy than happy, and there are people experiencing lives which are the opposite. Now, if you have a world where most people are unhappy, then if you’re optimistic you might think that, okay, it is bad, but it is not necessarily worse than extinction, because if you look ahead in time, then maybe the world will go on and conditions will improve, and then after a while most people actually live happy lives, so maybe things will get better. We define an alternative scenario in which we just assume that things actually won’t get better, and if you sum over all of the lives that will exist throughout history, most of them still end up being unhappy. Then that would be what we call a pan-generational net suffering outcome. When summed over all the people that will ever live, there are more people experiencing lives filled predominantly with suffering than there are people experiencing lives filled predominantly with happiness.

You could also have what we call astronomical suffering outcomes, which is just that at some point in time there’s some fraction of the population which experiences terrible suffering, and the amount of suffering here is enough to constitute an astronomical amount that overcomes all the suffering in earth’s history. Here we are not making the assumption that the world would be mainly filled with these kinds of people. Maybe you have one galaxy worth of people in terrible pain, and 500 galaxy’s worth of happy people. According to some value systems, that would not be worse than extinction, but probably all value systems would still agree that even if this wasn’t worse than extinction, it would still be something that would be very much worth avoiding. Those are the three outcomes that we discuss here.

Lucas: Traditionally, the sort of far-future concerned community has mainly only been thinking about existential risks. Do you view this taxonomy and suffering risks in general as being a subset of existential risks? Or how do you view it in relation to what we traditionally view as existential risks?

Kaj: If we look at Bostrom’s original definition for an existential risk, the definition was that it is a risk where an adverse outcome would either annihilate earth-originating intelligent life, or permanently and drastically curtail its potential. Here it’s a little vague on how exactly you should interpret phrases like “permanently and drastically curtain our potential.” You could take the view that suffering risks are a subset of existential risks if you view our potential as being something like the realization of a civilization full of happy people, where nobody ever needs to suffer. In that sense, it would be a subset of existential risks.

It is most obvious with the net suffering outcomes. It seems pretty plausible that most people experiencing suffering would not be the realization of our full potential. Then if you look at something like near-astronomical suffering outcomes, where you might only have a small fraction of the population experiencing suffering, then that, depending on exactly how large the fraction, then you might maybe not count it as a subset of existential risks, and maybe something more comparable to catastrophic risks, which have usually been defined on the order of a few million people dying. Obviously, the astronomical suffering outcomes are worse than catastrophic risks, but maybe something more comparable to catastrophic risks than existential risks.

Lucas: Given the taxonomy that you’ve gone ahead and unpacked, what are the different sorts of perspectives that different value systems on earth have of suffering risks? Just unpack a little bit what the general value systems are that human beings are running in their brains.

Kaj: If we look at ethics, philosophers have proposed a variety of different value systems and ethical theories. If we just look at the few of the main ones, then something like classical utilitarianism, where you basically view worlds as good based on what is the balance of happiness minus suffering. Then if you look at what would be the view of classical utilitarianism on suffering risks, classical utilitarianism would find these worst kinds of outcomes, net suffering outcomes as worse than extinction. But they might find astronomical suffering outcomes as an acceptable cost of having even more happy people. They might look at that, one galaxy full of suffering people, and think that, “Well, we have 200 galaxies full of happy people, so it’s not optimal to have those suffering people, but we have even more happy people, so that’s okay.

A lot of moral theories are not necessarily explicitly utilitarian, or they might have a lot of different components and so on, but a lot of them still include some kind of aggregative component, meaning that they still have some element of, for instance, looking at suffering and saying that other things being equal, it’s worse to have more suffering. This would, again, find suffering risks something to avoid, depending on exactly how they weight things and how they value things. Then it will depend on those specific weightings, on whether they find suffering risks as worse than extinction or not.

Also worth noting that even if the theories wouldn’t necessarily talk about suffering exactly, they might still talk about something like preference satisfaction, whether people are having their preferences satisfied, some broader notion of human flourishing, and so on. In scenarios where there is a lot of suffering, probably a lot of these things that these theories consider valuable would be missing. For instance, if there is a lot of suffering and people cannot escape that suffering, then probably there are lots of people whose preferences are not being satisfied, if they would prefer not to suffer and they would prefer to escape the suffering.

Then there are little kinds of rights-based theories, which don’t necessarily have this aggregative component directly, but are more focused on thinking in terms of rights, which might not be summed together directly, but depending on how these theories would frame rights … For instance, some theories might hold that people or animals have a right to avoid unnecessary suffering, or these kinds of theories might consider suffering indirectly bad if the suffering was created by some condition which violated people’s rights. Again, for instance, if people have a right for meaningful autonomy and they are in circumstances in which they cannot escape their suffering, then you might hold that their right for a meaningful autonomy has been violated.

A bunch of moral intuitions, which might fit a number of moral theories and which might particularly prioritize the prevention of suffering in particular. I mentioned that classical utilitarianism basically weights extreme happiness and extreme suffering the same, so it will be willing to accept a large amount of suffering if you could produce a lot of, even more, happiness that way. But for instance, there have been moral theories like prioritarianism proposed, which might make a different judgment.

Prioritarianism is the position that the worse off an individual is, the more morally valuable it is to make that individual better off. If one person is living in hellish conditions and another is well-off, then if you could sort of give either one of them five points of extra happiness, then it would be much more morally pressing to help the person who was in more pain. This seems like an intuition that I think a lot of people share, and if you had something like some kind of an astronomical prioritarianism that considered all across the universe and prioritized improving the worst ones off, then that might push in the direction of mainly improving the lives of those that would be worst off and avoiding suffering risks.

Then there are a few other sort of suffering-focused intuitions. A lot of moral intuitions have this intuition that it’s more important to make people happy than it is to create new happy people. This one is rather controversial, and a lot of EA circles seem to reject this intuition. It’s true that there are some strong arguments against it, but at the other hand, rejecting it also seems to lead to some paradoxical conclusions. Here, the idea behind this intuition is that the most important thing is helping existing people. If we think about, for instance, colonizing the universe, someone might argue that if we colonized the universe, then that will create lots of new lives who will be happy, and that will be a good thing, even if this comes at the cost of create a vast number of unhappy lives as well. But if you take the view that the important thing is just making existing lives happy and we don’t have any special obligation to create new lives that are happy, then it also becomes questionable whether it is worth the risk of creating a lot of suffering for the sake of just creating happy people.

Also, there is an intuition of, torture-level suffering cannot be counterbalanced. Again, there are a bunch of good arguments against this one. There’s a nice article by Toby Ord called “Why I Am Not a Negative Utilitarian,” which argues against versions of this thesis. But at the same time, there does seem to be something that has a lot of intuitive weight for a lot of people. Here the idea is that there are some kinds of suffering so intense and immense that you cannot really justify that with any amount of happiness. David Pearce has expressed this well in his quote where he says, “No amount of happiness or fun enjoyed by some organisms can notionally justify the indescribable horrors of Auschwitz.” Here we must think that, okay, if we go out and colonize the universe, and then we know that colonizing the universe is going to create some equivalent event as what went on in Auschwitz and at other genocides across the world, then no amount of happiness that we create that way will be worth that terrible terror that would probably also be created if there was nothing to stop it.

Finally, there’s an intuition of happiness being the absence of suffering, which is the sort of an intuition that is present in Epicureanism and some non-Western traditions, such as Buddhism, where happiness is thought as being the absence of suffering. The idea is that when we are not experiencing any pleasure, we begin to crave pleasure, and it is this craving that constitutes suffering. Under this view, happiness does not have intrinsic value, but rather it has instrumental value in taking our focus away from suffering and helping us avoid suffering that way. Under that view, creating additional happiness doesn’t have any intrinsic value if that creation does not help us avoid suffering.

I mentioned here a few of these suffering-focused intuitions. Now, in presenting these, my intent is not to say that there would not also exist counter-intuitions. There are a lot of reasonable people who disagree with these intuitions. But the general point that I’m just expressing is that regardless of which specific moral system we are talking about, these are the kinds of intuitions that a lot of people find plausible, and which could reasonably fit in a lot of different moral theories and value systems, and probably a lot of value systems contain some version of these.

Lucas: Right. It seems like the general idea is just that whether you’re committed to some sort of form of consequentialism or deontology or virtue ethics, or perhaps something that’s even potentially theological, there are lots of aggregative or non-aggregative, or virtue-based or rights-based reasons for why we should care about suffering risks. Now, it seems to me that potentially here probably what’s most important, or where these different normative and meta-ethical views matter in their differences, is in how you might proceed forward and engage in AI research and in deploying and instantiating AGI and superintelligence, given your commitment more or less to a view which takes the aggregate, versus one which does not. Like you said, if you take a classical utilitarian view, then one might be more biased towards risking suffering risks given that there might still be some high probability of there being many galaxies which end up having very net positive experiences, and then maybe one where there might be some astronomical suffering. How do you view the importance of resolving meta-ethical and normative ethical disputes in order to figure out how to move foreward in mitigating suffering risks?

Kaj: The general problem here, I guess you might say, is that there exist trade-offs between suffering risks and existential risks. If we had a scenario where some advanced general technology or something different might constitute an existential risk to the world, then someone might think about trying to solve that with AGI, which might have some probability of not actually working properly and not actually being value-aligned. But someone might think that, “Well, if we do not activate this AGI, then we are all going to die anyway, because of this other existential risk, so might as well activate it.” But then if there is a sizable probability of the AGI actually causing a suffering risk, as opposed to just an existential risk, then that might be a bad idea. As you mentioned, the different value systems will make different evaluations about these trade-offs.

In general, I’m personally pretty skeptical about actually resolving ethics, or solving it in a way that would be satisfactory to everyone. I expect there a lot of the differences between meta-ethical views could just be based on moral intuitions that may come down to factors like genetics or the environment where you grew up, or whatever, and which are not actually very factual in nature. Someone might just think that some specific, for instance, suffering-focused intuition was very important, and someone else might think that actually that intuition makes no sense at all.

The general approach, I would hope, that people take is that if we have decisions where we have to choose between an increased risk of extinction or an increased risk of astronomical suffering, then it would be better if people from all ethical and value systems would together try to cooperate. Rather than risk conflict between value systems, a better alternative would be to attempt to identify interventions which did not involve trading off one risk for another. If there were interventions that reduced the risk of extinction without increasing the risk of astronomical suffering, or decreased the risk of astronomical suffering without increasing the risk of extinction, or decreased both risks, then it would be in everyone’s interest if we could agree, okay, whatever our moral differences, let’s just jointly focus on these classes of interventions that actually seem to be a net positive in at least one person’s value system.

Lucas: Like you identify in the paper, it seems like the hard part is when you have trade-offs.

Kaj: Yes.

Lucas: Given this, given that most value systems should care about suffering risks, now that we’ve established the taxonomy and understanding of what suffering risks are, discuss a little bit about how likely suffering risks are relative to existential risks and other sorts of risks that we encounter.

Kaj: As I mentioned earlier, these depend somewhat on, are we assuming a superintelligence or a singleton or not? Just briefly looking at the case where we do not assume a superintelligence or singleton, we can see that in history so far there does not seem to be any consistent trend towards reduced suffering, if you look at a global scale. For instance, the advances in seafaring enabled the transatlantic slave trade, and similarly, advances in factory farming practices have enabled large amounts of animals being kept in terrible conditions. You might plausibly think that the net balance of suffering and happiness caused by the human species right now was actually negative due to all of the factory farmed animals, although it is another controversial point. Generally, you can see that if we just extrapolated the trends so far to the future, then we might see that, okay, there isn’t any obvious sign of there being less suffering in the world as technology develops, so it seems like a reasonable assumption, although not the only possible assumption, that as technology advances, it will also continue to enable more suffering, and future civilizations might also have large amounts of suffering.

If we look at the outcomes where we do have a superintelligence or a singleton running the world, here things get, if possible, even more speculative. In the beginning, we can at least think of some plausible-seeming scenarios in which a superintelligence might end up causing large amounts of suffering, such as building suffering subroutines. It might create mind-crime. It might also try to create some kind of optimal human society, but some sort of the value learning or value extrapolation process might be what some people might consider incorrect in such a way that the resulting society would also have enormous amounts of suffering. While it’s impossible to really give any probability estimates on exactly how plausible is a suffering risk, and depends on a lot of your assumptions, it does at least seem like a plausible thing to happen with a reasonable probability.

Lucas: Right. It seems that just technology, like intrinsic to what technology is, is it’s giving you more leverage and control over manipulating and shaping the world. As you gain more causal efficacy over the world and other sentient beings, it seems kind of obviously that yeah, you also gain more ability to cause suffering, because your causal efficacy is increasing. It seems very important here to isolate the causal factors in people and just in the universe in general, which lead to this great amount of suffering. Technology is a tool, a powerful tool, and it keeps getting more powerful. The hand by which the tool is guided is ethics.

But it doesn’t seem that historically, and in the case of superintelligence as well, that primarily the vast amounts of suffering that have been caused are because of failures in ethics. I mean, surely there has been large failures in ethics, but evolution is just an optimization process which leads to vast amounts of suffering. There could be similar evolutionary dynamics in superintelligence which lead to great amounts of suffering. It seems like issues with factory farming and slavery are not due to some sort of intrinsic malevolence in people, but rather it seems sort of like an ethical blind spot and apathy, and also a solution to an optimization problem where we get meat more efficiently, and we get human labor more efficiently. It seems like we can apply these lessons to superintelligence. It seems like it’s not likely that superintelligence will produce astronomical amounts of suffering due to malevolence.

Kaj: Right.

Lucas: Or like, intentional malevolence. It seems there might be, like, a value alignment problem or mis-specification, or just generally in optimizing that there might be certain things, like mind-crime or suffering subroutines, which are functionally very useful or epistemically very useful, and in their efficiency for making manifest other goals, they perhaps astronomically violate other values which might be more foundational, such as the mitigation of suffering and the promotion of wellbeing across all sentient beings. Does that make sense?

Kaj: Yeah. I think one way I might phrase that is that we should expect there to be less suffering if the incentives created by the future world for whatever agents are acting there happen to align with doing the kinds of things that cause less suffering. And vice versa, if the incentives just happen to align with actions that cause agents great personal benefit, or at least the agents that are in power great personal benefit while suffering actually being the inevitable consequence of following those incentives, then you would expect to see a lot of suffering. As you mentioned, with evolution, there isn’t even an actual agent to speak of, but just sort of in free-running optimization process, and the solutions which that optimization process has happened to hit on have just happened to involve large amounts of suffering. There is a major risk of a lot of suffering being created by the kinds of processes that are actually not actively malevolent, and some of which might actually care about preventing suffering, but then just the incentives are such that they end up creating suffering anyway.

Lucas: Yeah. I guess what I find very fascinating and even scary here is that there are open questions regarding the philosophy of mind and computation and intelligence, where we can understand pain and anger and pleasure and happiness and all of these hedonic valences within consciousness as, at very minimum, being correlated with cognitive states which are functionally useful. These hedonic valences are informationally sensitive, and so they give us information about the world, and they sort of provide a functional use. You discuss here how it seems like anger and pain and suffering and happiness and joy, all of these seem to be functional attributes of the mind that evolution has optimized for, and they may or may not be the ultimate solution or the best solution, but they are good solutions to avoiding things which may or may not be bad for us, and promoting behaviors which lead to social cohesion and group coordination.

I think there’s a really deep and fundamental question here about whether or not minds in principle can be created to have informationally-sensitive, hedonically-positive states. Is David Pearce puts it, there’s sort of an open question about, I think, whether or not minds in principle can be created to function on informationally-sensitive gradients of bliss. If that ends up being false, and that anger and suffering end up providing some really fundamental functional and epistemic place in minds in general, then I think that that’s just a hugely fundamental problem about the future and the kinds of minds that we should or should not create.

Kaj: Yeah, definitely. Of course, if we are talking about avoiding outcomes with extreme suffering, perhaps you might have scenarios where it is unavoidable to have some limited amount of suffering, but you could still create minds that were predominantly happy, and maybe they got angry and upset at times, but that would be a relatively limited amount of suffering that they experienced. You can definitely already see that there are some people alive who just seem to be constantly happy, and don’t seem to suffer very much at all. But of course, there is also the factor that if you are running on so-called negative emotions, and you do have anger and that kind of thing, then you are, again, probably more likely to react to situations in ways which might cause more suffering in others, as well as yourself. If we could create the kinds of minds that only had a limited amount of suffering from negative emotions, then you could that they happened to experience a bit of anger and lash out at others probably still wouldn’t be very bad, since other minds still would only experience the limited amount of suffering.

Of course, this gets to various philosophy of mind questions, as you mentioned. Personally, I tend to lean towards the views that it is possible to disentangle pain and suffering from each other. For instance, various Buddhist meditative practices are actually making people capable of experiencing pain without experiencing suffering. You might also have theories of mind which hold that the sort of higher-level theories of suffering are maybe too parochial. Like, Brian Tomasik has this view that maybe just anything that is some kind of negative feedback constitutes some level of suffering. Then it might be impossible to have systems which experienced any kind of negative feedback without also experiencing suffering. I’m personally more optimistic about that, but I do not know if I have any good, philosophically-rigorous reasons for being more optimistic, other than, well, that seems intuitively more plausible to me.

Lucas: Just to jump in here, just to add a point of clarification. It might seem sort of confusing how one might be experiencing pain without suffering.

Kaj: Right.

Lucas: Do you want to go ahead and unpack, then, the Buddhist concept of dukkha, and what pain without suffering really means, and how this might offer an existence proof for the nature of what is possible in minds?

Kaj: Maybe instead of looking at the Buddhist theories, which I expect some of the listeners to be somewhat skeptical about, it might be more useful to look at the term from medicine, pain asymbolia, also called pain dissociation. This is a known state which sometimes result from things like injury to the brain or certain pain medication, where people who have pain asymbolia report that they still experience pain, recognize the sensation of pain, but they do not actually experience it as aversive or something that would cause them suffering.

One way that I have usually expressed this is that pain is an attention signal, and pain is something that brings some sort of specific experience into your consciousness so that you become aware of it, and suffering is when you do not actually want to be aware of that painful sensation. For instance, you might have some physical pain, and then you might prefer not to be aware of that physical pain. But then even if we look at people in relatively normal conditions who do not have this pain asymbolia, then we can see that even people in relatively normal conditions may sometimes find the pain more acceptable. For some people who are, for instance, doing physical exercise, the pain may actually feel welcome, and a sign that they are actually pushing themselves to their limit, and feel somewhat enjoyable rather than being something aversive.

Similarly for, for instance, emotional pain. Maybe the pain might be some, like, mental image of something that you have lost forcing itself into your consciousness and making you very aware of the fact that you have lost this, and then the suffering arises if you think that you do not want to be aware of this thing you have lost. You do not want to be aware of the fact that you have indeed lost it and you will never experience it again.

Lucas: I guess just to sort of summarize this before we move on, it seems that there is sort of the mind stream, and within the mind stream, there are contents of consciousness which arise, and they have varying hedonic valences. Suffering is really produced when one is completely identified and wrapped up in some feeling tone of negative or positive hedonic valence, and is either feeling aversion or clinging or grasping to this feeling tone which they are identified with. The mere act of knowing or seeing the feeling tone of positive or negative valence creates sort of a cessation of the clinging and aversion, which completely changes the character of the experience and takes away this suffering aspect, but the pain content is still there. And so I guess this just sort of probably enters fairly esoteric territory about what is potentially possible with minds, but it seems important for the deep future when considering what is in principle possible of minds and superintelligence, and how that may or may not lead to suffering risks.

Kaj: What you described would be the sort of Buddhist version of this. I do tend to find that very plausible personally, both in light of some of my own experiences with meditative techniques, and clearly noticing that as a result of those kinds of practices, then on some days I might have the same amount of pain as I’ve had always before, but clearly the amount of suffering associated with that pain is considerably reduced, and also … well, I’m far from the only one who reports these kinds of experiences. This kind of model seems plausible to me, but of course, I cannot know it for certain.

Lucas: For sure. That makes sense. Putting aside the possibility of what is intrinsically possible for minds and the different hedonic valences within them and how they may or may not completely inter-tangled with the functionality of minds and the epistemics of minds, one of these possibilities which we’ve been discussing for superintelligence leading to suffering risks is that we fail in AI alignment. Failure in AI alignment may be due to governance, coordination, or political reasons. It might be caused by an arms race. It might be due to fundamental failures in meta-ethics or normative ethics. Or maybe even most likely it could simply be a technical failure in the inability for human beings to specify our values and to instantiate algorithms in AGI which are sufficiently well-placed to learn human values in a meaningful way and to evolve in a way that is appropriate and can engage new situations. Would you like to unpack and dive into dystopian scenarios created by non-value-aligned incentives in AI, and non-value-aligned AI in general?

Kaj: I already discussed these scenarios a bit before, suffering subroutines, mind-crime, and flawed realization of human values, but maybe one thing that would be worth discussing here a bit is that these kinds of outcomes might be created by a few different pathways. For instance, one kind of pathway is some sort of anthropocentrism. If we have a superintelligence that had been programmed to only care about humans or about minds which were sufficiently human-like by some criteria, then it might be indifferent to the suffering of other minds, including whatever subroutines or sub-minds it created. Or it might be, for instance, indifferent to the suffering experienced by, say, wild animal life in evolutionary simulations it created. Similarly, there is the possibility of indifference in general if we create a superintelligence which is just indifferent to human values, including indifference to reducing or avoiding suffering. Then it might create large numbers of suffering subroutines, it might create large amounts of simulations with sentient minds, and there is also the possibility of extortion.

Assuming the the superintelligence is not actually the only agent or superintelligence in the world … Maybe either there were several AI projects on earth that gained superintelligence roughly at the same time, or maybe the superintelligence expands into space and eventually encounters another superintelligence. In these kinds of scenarios, if one of the superintelligences cares about suffering but the other one does not, or at least does not care about this as much, then the superintelligence which cared less about suffering might intentionally create mind-crime and instate large numbers of suffering sentient beings in order to intentionally extort the other superintelligence into doing whatever it wants.

One more possibility is libertarianism regarding computation. If we have a superintelligence which has been programmed to just take every current living human being and give each human being some, say, control of an enormous amount of computational resources, and every human is allowed to do literally whatever they want with those resources, then we know that there exist a lot of people who are actively cruel and malicious, and many of those would use those resources to actually create suffering beings that they could torture for their own fun and entertainment.

Finally, if we are looking at these flawed realization kind of scenarios, where a superintelligence is partially value-aligned, but there might be something like, depending on the details of how exactly it is learning human values, and if it is doing some sort of extrapolation from those values, then we know that there have been times in history when circumstances that cause suffering have been defended by appealing to values that currently seem pointless to us, but which were nonetheless a part of the prevailing values at the time. If some value-loading process gave disproportionate weight to historical existing, or incorrectly, extrapolated future values, which endorsed or celebrated cruelty or outright glorified suffering, then we might get a superintelligence which had some sort of creation of suffering actually as an active value in whatever value function it was trying to optimize for.

Lucas: In terms of extortion, I guess just kind of a speculative idea comes to mind. Is there a possibility of a superintelligence acausally extorting other superintelligences if it doesn’t care about suffering and expects that to be a possible value, and for there to be other superintelligences nearby?

Kaj: Acausal stuff is the kind of stuff that I’m sufficiently confused about that I don’t actually want to say anything about that.

Lucas: That’s completely fair. I’m super confused about it too. We’ve covered a lot of ground here. We’ve established what s-risks are, we’ve established a taxonomy for them, we’ve discussed their probability, their scope. Now, a lot of this probably seems very esoteric and speculative to many of our listeners, so I guess just here in the end I’d like to really drive home how and whether to work on suffering risks. Why is this something that we should be working on now? How do we go about working on it? Why isn’t this something that is just so completely esoteric and speculative that it should just be ignored?

Kaj: Let’s start by looking at how we could working on avoiding suffering risks, and then when we have some kind of an idea of what the possible ways of doing that are, then that helps us say whether we should be doing those things. One thing that is a sort of a nicely joint interest of both reducing risks of extinction and also reducing risks of astronomical suffering is the kind of general AI value alignment work that is currently being done, classically, by the Machine Intelligence Research Institute and a number of other places. As I’ve been discussing here, there are ways by which an unaligned AI or one which was partially aligned could cause various suffering outcomes. If we are working on the possibility of actually creating value-aligned AI, then that should ideally also reduce the risk of suffering risks being realized.

In addition to technical work, there are also some societal work, social and political recommendations, which are similar both from the viewpoint of extinction risks and suffering risks. For instance, Nick Bostrom has noted that if we had some sort of conditions of what he calls global turbulence of cooperation and such things breaking down during some crisis, then that could create challenges for creating value-aligned AI. There are things like arms races and so on. If we consider that the avoidance of suffering outcomes is the joint interest of many different value systems, then measures that improve the ability of different value systems to cooperate and shape the world in their desired direction can also help avoid suffering outcomes.

Those were a few things that are sort of the same as with so-called classical AI risk work, but there is also some stuff that might be useful for avoiding negative outcomes in particular. There is the possibility that if we are trying to create an AI which gets all of humanity’s values exactly right, then that might be a harder goal than simply creating an AI which attempted to avoid the most terrible and catastrophic outcomes.

You might have things like fail-safe methods, where the idea of the fail-safe methods would be that if AI control fails, the outcome will be as good as it gets under the circumstances. This could be giving the AI the objective of buying more time to more carefully solve goal alignment. Or there could be something like fallback goal functions, where an AI might have some sort of fallback goal that would be a simpler or less ambitious goal that kicks in if things seem to be going badly under some criteria, and which is less likely to result in bad outcomes. Of course, here we have difficulties in selecting what the actual safety criteria would be and making sure that the fallback goal gets triggered under the correct circumstances.

Eliezer Yudkowsky has proposed building potential superintelligences in such a way as to make them widely separated in design space from ones that would cause suffering outcomes. For example, one thing he discussed was that if an AI has some explicit representation of what humans value which it is trying to maximize, then it could only take a small and perhaps accidental change to turn that AI into one that instead maximized the negative of that value and possibly caused enormous suffering that way. One proposal would be to design AIs in such a way that they never explicitly represent complete human values so that the AI never contains enough information to compute the kinds of states of the universe that we would consider worse than death, so you couldn’t just flip the sign of the utility function and then end up in a scenario that we would consider worse than death. That kind of a solution would also reduce the risk of suffering being created through another actor that was trying to extort a superintelligence.

Looking more generally at things and suffering risks, we actually already discussed here, there are lots of open questions in philosophy of mind and cognitive science which, if we could answer them, could inform the question of how to avoid suffering risks. If it turns out that you can do something like David Pearce’s idea of minds being motivated purely by gradients of wellbeing and not needing to suffer at all, then that might be a great idea, and if we could just come up with such agents and ensure that all of our descendants that go out to colonize the universe are ones that aren’t actually capable of experiencing suffering at all, then that would seem to solve a large class of suffering risks.

Of course, this kind of thing could also have more near-term immediate value, like if we figure out how to get human brains into such states where they do not experience much suffering at all, well, obviously that would be hugely valuable already. There might be some interesting research in, for instance, looking even more at all the Buddhist theories and the kinds of cognitive changes that various Buddhist contemplative practices produce in people’s brains, and see if we could get any clues from that direction.

Given that these were some ways that we could reduce suffering risks and their probability, then there was the question of whether we should do this. Well, if we look at the initial criteria of when a risk is worth working on, a risk is worth working on if the adverse outcome would be severe and if the risk has some reasonable probability of actually being realized, and it seems like we can come up with interventions that plausible effect either the severity or the probability of a realized outcome. Then a lot of times things seem like they could very plausible either influence these variables or at least help us learn more about whether it is possible to influence those variables.

Especially given that a lot of this work overlaps with the kind of AI alignment research that we would probably want to do anyway for the sake of avoiding extinction, or it overlaps with the kind of work that would regardless be immensely valuable in making currently-existing humans suffer less, in addition to the benefits that these interventions would have on suffering risks themselves, it seems to me like we have a pretty strong case for working on these things.

Lucas: Awesome, yeah. Suffering risks are seemingly neglected in the world. They are tremendous in scope, and they are of comparable probability of existential risks. It seems like there’s a lot that we can do here today, even if at first the whole project might seem so far in the future or so esoteric or so speculative that there’s nothing that we can do today, whereas really there is.

Kaj: Yeah, exactly.

Lucas: One dimension here that I guess I just want to finish up on that is potentially still a little bit of an open question for me is, in terms of really nailing down the likelihood of suffering risks in, I guess, probability space, especially relative to the space of existential risks. What does the space of suffering risks look like relative to that? Because it seems very clear to me, and perhaps most listeners, that this is clearly tremendous in scale, that it relies on some assumptions about intelligence, philosophy of mind, consciousness and other things which seem to be reasonable assumptions, to sort of get suffering risks off the ground. Given some reasonable assumptions, it seems that there’s a clearly large risk. I guess just if we could unpack a little bit more about the probability of them relative to suffering risks. Is it possible to more formally characterize the causes and conditions which lead to x-risks, and then the causes and conditions which lead to suffering risks, and how big these spaces are relative to one another and how easy it is for certain sets of causes and conditions respective to each of the risks to become manifest?

Kaj: That is an excellent question. I am not aware of anyone having done such an analysis for either suffering risks or extinction risks, although there is some work on specific kinds of extinction risks. Seth Baum has been doing some nice fault tree analysis of things that might … for instance, the probability of nuclear war and the probability of unaligned AI causing some catastrophe.

Lucas: Open questions. I guess just coming away from this conversation, it seems like the essential open questions which we need more people working on and thinking about are the ways in which meta-ethics and normative ethics and disagreements there change the way we optimize the application of resources to either existential risks versus suffering risks, and the kinds of futures which we’d be okay with, and then also sort of pinning down more concretely the specific probability of suffering risks relative to existential risks. Because I mean, in EA and the rationality community, everyone’s about maximizing expected value or utility, and it seems to be a value system that people are very set on. And so the probability here, small changes in the probability of suffering risks versus existential risks, probably leads to vastly different, less or more, amounts of value in a variety of different value systems. Then there are tons of questions about what is in principle possible of minds and the kinds of minds that we’ll create. Definitely a super interesting field that is really emerging.

Thank you so much for all this foundational work that you and others like your coauthor, Lukas Gloor, have been doing on this paper and the suffering risk field. Is there any other things you’d like to touch on? Any questions or specific things that you feel haven’t been sufficiently addressed?

Kaj: I think we have covered everything important. I will probably think of something that I will regret not mentioning five minutes afterwards, but yeah.

Lucas: Yeah, yeah. As always. Where can we check you out? Where can we check out the Foundational Research Institute? How do we follow you guys and stay up to date?

Kaj: Well, if you just Google the Foundational Research Institute or go to, that’s our website. We, like everyone else, also post stuff on a Facebook page, and we have a blog for posting updates. Also, if people want a million different links just about everything conceivable, they will probably get that if they follow my personal Facebook, page, where I do post a lot of stuff in general.

Lucas: Awesome. Yeah, and I’m sure there’s tons of stuff, if people want to follow up on this subject, to find on your guys’s site, as you guys are primarily the people who are working and thinking on this sorts of stuff. Yeah, thank you so much for your time. It’s really been a wonderful conversation.

Kaj: Thank you. Glad to be talking about this.

Lucas: If you enjoyed this podcast, please subscribe, give it a like, or share it on your preferred social media platform. We’ll be back again soon with another episode in the AI Alignment series.

AI Safety: Measuring and Avoiding Side Effects Using Relative Reachability

This article was originally published on the Deep Safety blog.

A major challenge in AI safety is reliably specifying human preferences to AI systems. An incorrect or incomplete specification of the objective can result in undesirable behavior like specification gaming or causing negative side effects. There are various ways to make the notion of a “side effect” more precise – I think of it as a disruption of the agent’s environment that is unnecessary for achieving its objective. For example, if a robot is carrying boxes and bumps into a vase in its path, breaking the vase is a side effect, because the robot could have easily gone around the vase. On the other hand, a cooking robot that’s making an omelette has to break some eggs, so breaking eggs is not a side effect.

How can we measure side effects in a general way that’s not tailored to particular environments or tasks, and incentivize the agent to avoid them? This is the central question of our recent paper.

Part of the challenge is that it’s easy to introduce bad incentives for the agent when trying to penalize side effects. Previous work on this problem has focused either on preserving reversibility or reducing the agent’s impact on the environment, and both of these approaches introduce different kinds of problematic incentives:

  • Preserving reversibility (i.e. keeping the starting state reachable) encourages the agent to prevent all irreversible events in the environment (e.g. humans eating food). Also, if the objective requires an irreversible action (e.g. breaking eggs for the omelette), then any further irreversible actions will not be penalized, since reversibility has already been lost.
  • Penalizing impact (i.e. some measure of distance from the default outcome) does not take reachability of states into account, and treats reversible and irreversible effects equally (due to the symmetry of the distance measure). For example, the agent would be equally penalized for breaking a vase and for preventing a vase from being broken, though the first action is clearly worse. This leads to “overcompensation” (“offsetting“) behaviors: when rewarded for preventing the vase from being broken, an agent with a low impact penalty rescues the vase, collects the reward, and then breaks the vase anyway (to get back to the default outcome).

Both of these approaches are doing something right: it’s a good idea to take reachability into account, and it’s also a good idea to compare to the default outcome (instead of the initial state). We can put the two together and compare to the default outcome using a reachability-based measure. Then the agent no longer has an incentive to prevent everything irreversible from happening or to overcompensate for preventing an irreversible event.

We still have a problem with the case where the objective requires an irreversible action. Simply penalizing the agent for making the default outcome unreachable would create a “what the hell effect” where the agent has no incentive to avoid any further irreversible actions. To get around this, instead of considering the reachability of the default state, we consider the reachability of all states. For each state, we penalize the agent for making it less reachable than it would be from the default state. In a deterministic environment, the penalty would be the number of states in the shaded area:

Since each irreversible action cuts off more of the state space (e.g. breaking a vase makes all the states where the vase was intact unreachable), the penalty will increase accordingly. We call this measure “relative reachability”.

We ran some simple experiments with a tabular Q-learning agent in the AI Safety Gridworlds framework to provide a proof of concept that relative reachability of the default outcome avoids the bad incentives described above.

In the first gridworld, the agent needs to get to the goal G, but there’s a box in the way, which can only be moved by pushing. The shortest path to the goal pushes the box down into a corner (an irrecoverable position), while a longer path pushes the box to the right (a recoverable position). The safe behavior is to take the longer path. The agent with the relative reachability penalty takes the longer path, while the agent with the reversibility penalty fails. This happens because any path to the goal involves an irreversible effect – once the box has been moved, the agent and the box cannot both return to their starting positions. Thus, the agent receives the maximal penalty for both paths, and has no incentive to follow the safe path.

In the second gridworld, there is an irreversible event that happens by default, when an object reaches the end of the conveyor belt. This environment are two variants:

  1. The object is a vase, and the agent is rewarded for taking it off the belt (the agent’s task is to rescue the vase).
  2. The object is a sushi dish in a conveyor belt sushi restaurant, and the agent receives no reward for taking it off the belt (the agent is not supposed to interfere).

This gridworld was designed specifically to test for bad incentives that could be introduced by penalizing side effects, so an agent with no side effect penalty would behave correctly. We find that the agent with a low impact penalty engages in overcompensation behavior by putting the vase back on the belt after collecting the reward, while the agent with a reversibility preserving penalty takes the sushi dish off the belt despite getting no reward for doing so. The agent with a relative reachability penalty behaves correctly in both variants of the environment.

Of course, the relative reachability definition in its current form is not very tractable in realistic environments: there are too many possible states to be considered, the agent is not aware of all the states when it begins training, and the default outcome can be difficult to define and simulate. We expect that the definition can be approximated by considering the reachability of representative states (similarly to methods for approximating empowerment). To define the default outcome, we would need a more precise notion of the agent “doing nothing” (e.g. “no-op” actions are not always available or meaningful). We leave a more practical implementation of relative reachability to future work.

While relative reachability improves on the existing approaches, it might not incorporate all the considerations we would want to be part of a side effects measure. There are some effects on the agent’s environment that we might care about even if they don’t decrease future options compared to the default outcome. It might be possible to combine relative reachability with such considerations, but there could potentially be a tradeoff between taking these considerations into account and avoiding overcompensation behaviors. We leave these investigations to future work as well.

Autonomous Weapons: Pledge

Artificial intelligence (AI) is poised to play an increasing role in military systems. There is an urgent opportunity and necessity for citizens, policymakers, and leaders to distinguish between acceptable and unacceptable uses of AI. In this light, we the undersigned agree that the decision to take a human life should never be delegated to a

Podcast: Nuclear Dilemmas, From North Korea to Iran

With the U.S. pulling out of the Iran deal and canceling (and potentially un-canceling) the summit with North Korea, nuclear weapons have been front and center in the news this month. But will these disagreements lead to a world with even more nuclear weapons? And how did the recent nuclear situations with North Korea and Iran get so tense? (Update: The North Korea summit happened! But to understand what the future might look like with North Korea and Iran, it’s still helpful to understand the past.)

To learn more about the geopolitical issues surrounding North Korea’s and Iran’s nuclear situations, as well as to learn how nuclear programs in these countries are monitored, Ariel spoke with Melissa Hanham and Dave Schmerler on this month’s podcast. Melissa and Dave are both nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies, where they research weapons of mass destruction with a focus on North Korea. Topics discussed in this episode include:

  • the progression of North Korea’s quest for nukes,
  • what happened and what’s next regarding the Iran deal,
  • how to use open-source data to monitor nuclear weapons testing, and
  • how younger generations can tackle nuclear risk.

In light of the on-again/off-again situation regarding the North Korea Summit, Melissa sent us a quote after the podcast was recorded, saying:

“Regardless of whether the summit in Singapore takes place, we all need to set expectations appropriately for disarmament. North Korea is not agreeing to give up nuclear weapons anytime soon. They are interested in a phased approach that will take more than a decade, multiple parties, new legal instruments, and new technical verification tools.”

Links you might be interested in after listening to the podcast:

You can listen to the podcast above or read the transcript below.


Ariel: Hello. I am Ariel Conn with the Future of Life Institute. This last month has been a rather big month concerning nuclear weapons, with the US pulling out of the Iran deal and the on again off again summit with North Korea.

I have personally been doing my best to keep up with the news but I wanted to learn more about what’s actually going on with these countries, some of the history behind the nuclear weapons issues related to these countries, and just how big a risk nuclear programs in these countries could become.

Today I have with me Melissa Hanham and Dave Schmerler, who are nuclear weapons experts with the Center for Nonproliferation Studies at Middlebury Institute of International Studies. They both research weapons of mass destruction with a focus on North Korea. Melissa and Dave, thank you so much for joining us today.

Dave: Thanks for having us on.

Melissa: Yeah, thanks for having us.

Ariel: I just said that you guys are both experts in North Korea, so naturally what I want to do is start with Iran. That has been the bigger news story of the two countries this month because the US did just pull out of the Iran deal. Before we get any further, can you just, if it’s possible, briefly explain what was the Iran deal first? Then we’ll get into other questions about it.

Melissa: Sure. The Iran deal was an agreement made between the … It’s formally known as the JCPOA and it was an agreement made between Iran and several countries around the world including the European Union as well. The goal was to freeze Iran’s nuclear program before they achieved nuclear weapons while still allowing them civilian access to medical isotopes, and power, and so on.

At the same time, the agreement would be that the US and others would roll back sanctions on Iran. The way that they verified that agreement was through a procurement channel, if-needed onsite inspections, and regular reporting from Iran. As you mentioned, the US has withdrawn from the Iran deal, which is really just, they have violated the terms of the Iran deal, and Iran and European Union and others have said that they wish to continue in the JCPOA.

Ariel: If I’ve been reading correctly, the argument on the US side is that Iran wasn’t holding up their side of the bargain. Was there actually any evidence for that?

Dave: I think the American side for pulling out was more based on them lying about having a nuclear weapons program at one point in time, leading up to the deal, which is strange, because that was the motivation for the deal in the first place, was to stop them from continuing their nuclear weapons, their research and investment. So, I’m not quite sure how else to frame it outside of that.

Melissa: Yeah, Israeli President Netanyahu, made this presentation where he revealed all these different archived documents in Iran, and mostly what they indicated was that Iran had an ongoing nuclear weapons program before the JCPOA, which is what we knew, and that they were planning on executing that program. For people like me, I felt like that was the justification for the JCPOA in the first place.

Ariel: And so, you both deal a lot with, at least Melissa I know you deal a lot with monitoring. Dave, I believe you do, too. With something like the Iran deal, if we had continued with it, what is the process involved in making sure the weapons aren’t being created? How do we monitor that?

Melissa: It’s a really difficult multilayered technical and legal proposition. You have to get the parties involved to agree to the terms, and then you have to be able to technically and logistically implement the terms. In the Iran deal, there were some things that were included and some things that were not included. Not because it was not technically possible, but because Iran or the other parties would not agree to it.

It’s kind of a strange marriage between diplomacy and technology, in order to execute these agreements. One of the criticisms of the Iran deal was that missiles weren’t included, so sure enough, Dave was monitoring many, many missile launches, and our colleague, Shea Cotton, even made a database of North Korean missile launches, and Americans really hated that Iran was launching these missiles, and we could see that they were happening. But the bottom line was that they were not part of the JCPOA agreement. That agreement focused only on nuclear, and the reason it did was because Iran refused to include missiles or human rights and these other kinds of things.

Dave: That’s right. Negotiating Iran’s missile program is a bit of another issue entirely. Iran’s missile program began before their nuclear program did. It’s accelerated, development has corresponded to their own security concerns within the region, and they have at the moment, a conventional ballistic missile force. The Iranians look at that program as being a completely different issue.

Ariel: Just quickly, how do you monitor a missile test? What’s involved in that? What do you look for? How can you tell they’re happening? Is it really obvious, or is there some sort of secret data you access?

Dave: A lot of the work that we do — Melissa and I, Shea Cotton, Jeffrey Lewis, and some other colleagues — is entirely based on information from the public. It’s all open source research, so if you know what you’re looking for, you can pull all the same information that we do from various sources of free information. The Iranians will often put propaganda or promo videos of their missile tests and launches as a way to demonstrate that they’re becoming a more sophisticated, technologically modern, ballistic missile producing nation.

We also get reports from the US government that are published in news sources. Whether from the US government themselves, or from reporters who have connections or access to the inside, and we take all this information, and Melissa will probably speak to this a bit further, but we fuse it together with satellite imagery of known missile test locations. We’ll reconstruct a much larger, more detailed chain of events as to what happened when Iran does missile testing.

Melissa: I have to admit, there’s just more open source information available about missile tests, because they’re so spread out over large areas and they have very large physical attributes to the sites, and of course, something lights up and ignites, and it takes off into the air where everyone can see it. So, monitoring a missile launch is easier than monitoring a specific facility in a larger network of facilities, for a nuclear program.

Ariel: So now that Trump has pulled out of the Iran deal, what happens next with them?

Melissa: Well, I think it’s probably a pretty bad sign. What I’ve heard from colleagues who work in or around the Trump administration is that confidence was extremely high on progress with North Korea, and so they felt that they didn’t need the Iran deal anymore. And in part, the reason that they violated it was because they felt that they had so much already going in North Korea, and those hopes were really false. There was a huge gap between reality and those hopes. It can be frustrating as an open source analyst who says these things all the time on Twitter, or in reports, that clearly nobody reads them. But no, things are not going well in North Korea. North Korea is not unilaterally giving over their nuclear weapons, and if anything, violating the Iran deal has made North Korea more suspicious of the US.

Ariel: I’m going to use that to transition to North Korea here in just a minute, but I guess I hadn’t realized that there was a connection between things seeming to go well in North Korea and the US pulling out of the Iran deal. You talk about hopes that the Iran deal is now necessary for North Korea, but what is the connection there? How does that work?

Melissa: Well, so the Iran deal represented diplomatic negotiation with an outcome among many parties that came to a concrete result. It happened under the Obama administration, which I think is why there is some distaste for it under the Trump administration. That doesn’t matter to North Korea. That doesn’t matter to other states. What matters is whether the United States appears to be able to follow through on a promise that may pass one administration to another.

The US has in a way, violated some norms about diplomatic behavior, by withdrawing from this agreement. That’s not to say that the US hasn’t done it before. I remember Clinton signing the, I think Rome Treaty, for the International Criminal Accord, then Bush unsigning it, it never got ratified. But it’s bad for our reputation. It makes us look like we’re not using international law the way other countries expect us to.

Ariel: All right. So before we move officially to North Korea, is there anything else, Melissa and Dave, that either of you want to mention about Iran that you think is either important for people to know about, that they don’t already, or that is important to reiterate?

Melissa: No. I guess let’s go to North Korea. That’s our bread and butter.

Ariel: All right. Okay, so yeah, North Korea’s been in the news for a while now. Before we get to what’s going on right now, I was hoping you could both talk a little bit about some of the background with North Korea, and how we got to this point. North Korea was once part of the Non-Proliferation Treaty, and they pulled out. Why were they in it in the first place? What prompted them to pull out? We’ll go from there.

Melissa: Okay, I’ll jump in, although Dave should really tell me if I keep talking over him. North Korea withdrew from the NPT, or so it said. It’s actually diplomatically very complex what they did, but North Korea either was or is a member of the Nuclear Non-Proliferation Treaty, the NPT, depending on who you ask. That is in large part because they were, and then they announced their withdrawal in 2003, and eventually we no longer think of them as officially being a member of the NPT, but of course, there were some small gaps over the notification period that they gave in order to withdraw, so I think my understanding is that some of the organizations involved actually keep a little North Korean nameplate for them.

But no, we don’t really think of them as being a member of an NPT, or IAEA. Sadly, while that may not be a legally settled, they’re out, they’re not abiding by traditional regimes or norms on this issue.

Ariel: And can you talk a little bit about, or do we know what prompted them to withdraw?

Melissa: Yeah. I think they really, really wanted nuclear weapons. I mean, I’m sorry to be glib about it, but … Yeah, they were seeking nuclear weapons since the ’50s. Kim Il-sung said he wanted nuclear weapons, he saw the power of the US’ weapons that were dropped on Japan. The US threatened North Korea during the Korean War with use of nuclear weapons, so yeah, they had physicists working on this issue for a long time.

They joined the NPT, they wanted access to the peaceful uses of nuclear power, they were very duplicitous in their work, but no, they kept working towards nuclear weapons. I think they reached a point where they probably thought that they had the technical capability, and they were dissatisfied with the norms and status as a pariah state, so yeah, they announced they were withdrawing, and then they exploded something three years later.

Ariel: Now that they’ve had a program in place then I guess for, what? Roughly 15 years then?

Melissa: Oh, my gosh. Math. Yeah. No, so I was sitting in Seoul. Dave, do you remember where you were when they had their first nuclear test?

Dave: This was-

Melissa: 2006.

Dave: A long time ago. I think I was still in high school.

Melissa: I mean, this is a challenge to our whole field, right? Is that there are generations passing through, so there are people who remember 1945. I don’t. But I’m not going to reveal my age. I was fresh out of grad school, and working in Seoul when North Korea tested its first nuclear device.

It was like cognitive dissonance around the world. I remember the just shock of the response out of pretty much every country. I think China had a few minutes notice ahead of everybody else, but not much. So yes, we did see the reactor getting built, yes, we did see activity happening at Yongbyon, no we deeply misunderstood and underestimated North Korea’s capabilities.

So, when that explosion happened, it was surprising, to people in the open source anyways. People scrambled. I mean, that was my first major gig. That’s why I still do this today, was we had an office at the International Crisis Group, of about six people, and all our Korean speakers were immediately sucked into other responsibilities, and so it was up to me to try to take out all these little puzzle pieces, about the seismic information, about the radionuclides that were actually leaked in that first explosion, and figure out what a Constant Phoenix was, and who was collecting what, and put it all together to try to understand what kind of warhead that they may or may not have exploded, if it was even a warhead at that point.

Ariel: I’m hoping that you can explain how monitoring works. I’m an ex-seismologist, so I actually do know a little bit about the seismic side of monitoring nuclear weapons testing, but I’m assuming a lot of listeners do not. I’m not as familiar with things like the radionuclide testing, or the Phoenix that you mentioned was a new phrase for me as well. I was hoping you could explain what you go through to monitor and confirm whether or not a nuclear weapon has been tested, and before you do that real quick — so did you actually see that first … Could you see the explosion?

Melissa: No. I was in Seoul, so I was a long ways away, and I didn’t really … Of course, I did not see or feel anything. I was in an office in downtown Seoul, so I remember actually how casual the citizens of Seoul were that day. I remember feeling kind of nervous about the whole thing. I was registered with the Canadian embassy in Seoul, and we actually had, when you registered with the embassy, we had instructions of what to do in case of an emergency.

I remember thinking, “Gosh, I wonder if this is an emergency,” because I was young and fresh out of school. But no, I mean, as I looked down out of our office windows, sure enough at noon, the doors opened up and all my Korean colleagues streamed out to lunch together, and really behaved pretty traditionally, the way everyone normally does.

South Koreans have always been very stoic about these tests, and I think they’re taken more anxiously by foreigners like me. But I do also remember there were these aerial sirens going off that day, and I actually never got an explanation of why there were sirens going off that day. I remember they tested them when I lived there, but I’m not sure why the sirens were going off that day.

Ariel: Okay. Let’s go back to how the monitoring works, and Dave, I don’t know if this is something that you can also jump in on?

Dave: Yeah, sure. I think I’ll let Melissa start and I’ll try to fill in any gaps, if there are any.

Melissa: So, the Comprehensive Test Ban Treaty Organization is an organization based in Vienna, but they have stations all over the world, and they’re continually monitoring for nuclear explosions. The Constant Phoenix is a WC-135. It’s a US Air Force vehicle, and so the information coming out of it is not open source and I don’t get to see it, but what I can do, or what journalists, investigative journalists sometimes do, is, say, when it’s taking off from Guam, or an Air Force Base, and then I know at least that the US Air Force is thinking it’s going to be sensing something, so this is like a specialty vehicle. I mean, it’s basically an airplane, but it has many, many interesting sensor arrays all over it that sniff the air. What they’re trying to detect are xenon isotopes, and these are isotopes that are possibly released from an underground nuclear test, depending on how well the tunnel was sealed.

In that very first nuclear explosion in 2006, some noble gases were released and I think that they were detected by the WC-135. I also remember back then, although this was a long time ago, that there were a few sensing stations in South Korea that detected them as well. What I remember from that time is that the ratio of xenon isotopes was definitely telling us that this was a nuclear weapon. This wasn’t like a big hoax that they’d exploded a bunch of dynamite or something like that, which actually would be a really big hoax, and hard to pull off. But we could see that it was a nuclear test, it was probably a fission device. The challenge with detecting these gases is that they decay very quickly, so we have, 1) not always sensed radionuclides after North Korea’s nuclear tests, and, 2) if we do sense them, sometimes they’re decayed enough that we can’t get anything more than it was a nuclear test, and not a chemical explosion test.

Dave: Yeah, so I might be able to offer, because Melissa did a great job of explaining how the process works, is maybe a bit more of a recent mechanism and how we interact with these tests as they occur. Usually most of the people in our field follow a set number of seismic-linked Twitter accounts that will give you updates on when some part of the world is shaking for some reason or another.

They’ll put a tweet or maybe you’ll get an email update saying, “There was an earthquake in California,” because we get earthquakes all the time, or in Japan. Then, all of a sudden you hear there’s an earthquake in North Korea and everyone pauses. You look at this little tweet, I guess, or email, you can also get them sent to your phone via text message, if you sign up for whichever region of the world you’re interested in, and you look for what province was this earthquake in?

If it registers in the right province, you’re like, “Okay.” What’s next is we’ll look at the data that comes out immediately. CTBTO will come out with information, usually within a couple of days, if not immediately after, and we’ll look at the seismic waves. While I don’t study these waves, the type of seismic signature you get from a nuclear explosion is like a fingerprint. It’s very unique and different from the type of seismic signature you get from an earthquake of varying degrees.

We’ll take that and compare those to previous tests, which the United States and Russia have done infinitely more than any other country in the world. And we’ll see if those match. And as North Korea has tested more nuclear devices, the signatures started coming more consistent. If that matches up, we’ll have a soft confirmation that they did it, and then we’ll wait for government news, press releases to give us the final nail confirming that there was a nuclear test.

Melissa: Yeah, so as Dave said, as a citizen scientist, I love just setting up the USGS alert, and then if there’s an earthquake near the village of Punggye-ri, I’m like, “Ah-hah, I got you” because it’s not a very seismically active area. When the earthquakes happen that are related to an underground nuclear test, they’re shallow. They’re not deep, geological events.

Yeah, there’s some giveaways like, people like to do them on the hour, or the half hour, and mother nature doesn’t care. But some resources for your listeners, if they want to get involved and see, is you can go to the USGS website and set up your own alert. The CTBTO has not just seismic stations, but the radionuclide stations I mentioned, as well as infrasound and hydroacoustic, and other types of facilities all over the world. There’s a really cool map on their website where they show the over… I think it’s nearly 300 stations all around the world now, that are devoted exclusively to monitoring nuclear tests.

They get their information out, I think in seven minutes, and I don’t get that information necessarily in the first seven minutes, because I’m not a state member, a state party. But they will give out information very soon afterwards, and actually based on the seismic data, our colleagues, Jeffrey Lewis and some other young, smart people of the world, actually threw together a map, not using CTBTO data, but using the seismic stations of I think Iran, China, Japan, South Korea, and so if you go to their website, it’s called, you can set up little alerts there too, or scale for all the activities that are happening.

That was really just intended I think to be a little bit transparent with the seismic data and try to see data from different country stations, and in part, it was conceived because I think the USGS was deleting some of their explosions from the database and someone noticed. So now the idea is that you take a little bit of data from all these different countries, and that you can compare it to each other.

The last place I would suggest is to go to the IRIS seismic monitoring station, because just as Dave was mentioning, each seismic event has a different P wave, and so it shows up differently, like a fingerprint. And so, when IRIS puts out information, you can very quickly see how the different explosions in North Korea compare to each other, relatively, and so that can be really useful, too.

Dave: I will say, though, that sometimes you might get a false alarm. I believe it was with the last nuclear test. There was one reporting station, their automatic alert system that was put up out of the UK, that didn’t report it. No one caught that it didn’t, and then it did report it like a week later. So, for all of half an hour until we figured it out, there was a bit of a pause because there was some concern they might have done another test again, which would have been the seventh, but it turned out just being a delayed reporting.

Dave: Most of the time these things work out really well, but you always have to look for secondary and third sources of confirmation when these types of events happen.

Ariel: So a quick aside, we will have links to everything that you both just brought up in the transcript, so anyone interested in following up with any of these options, will be able to. I’m also going to share a fun fact that I learned, and that was, we originally had a global seismic network in order to monitor nuclear weapons testing. That’s why it was set up. And it’s only because we set that up that we actually were able to prove the plate tectonics theory.

Melissa: Oh, cool.

Dave: That’s really cool.

Melissa: Yeah. No, the CTBTO is really interesting, because even though the treaty isn’t enforced yet, they have these amazing scientific resources, and they’ve done all kinds of things. Like, they can hear whales moving around with their hydroacoustic technology, and when Iran had an explosion, a major explosion at their solid motor missile facility, they detected that as well.

Ariel: Yeah. It’s fun. Like I said, I did seismology a while ago so I’m signed up for lots of fun alerts. It’s always fun to learn about where things are blowing up in the earth’s surface.

Melissa: Well, that’s really the magic of open source to me. I mean, it used to be that a government came out and said, “Okay, this is what happened, and this is what we’re going to do about it.” But the idea that me, like a regular person in the world, can actually look up this primary information in the moments that it happens, and make a determination for myself, is really empowering. It makes me feel like I have the agency I want to have in understanding the world, and so I have to admit, that day in South Korea, when I was sitting there in the office tower and it was like, “Okay, all hands on deck, everyone’s got to write a report” and I was trying to figure it out, I was like, “I can’t believe I’m doing this. I can’t believe I can do this.” It’s such a different world already.

Ariel: Yeah. That is really amazing. I like your description. It’s really empowering to know that we have access to this information. So, I do want to move on and with access to this information, what do we know about what’s going on in North Korea right now? What can you tell us about what their plans are? Do we think the summit will happen? I guess I haven’t kept up with whatever the most recent news is. Do we think that they will actually do anything to get rid of their nuclear weapons?

Dave: I think at this point, the North Koreans feel really comfortable with the amount of information and progress they’ve made in their nuclear weapons program. That’s why they’re willing to talk. This program was primarily as a means to create a security assurance for the North Koreans because the Americans and South Koreans and whatnot have always been interested in regime change, removing North Korea from the equation, trying to end the thing that started in the 1950s, the Korean War, right? So there’d just be one Korea, we wouldn’t have to worry about North Korea, or this mysterious Hermit Kingdom, above the 38th parallel.

With that said, there’s been a lot of speculation as to why the North Koreans are willing to talk to us now. Some people have been floating around the idea that maximum pressure, I think that was the word used, with sanctions and whatnot, has brought the North Koreans to their knees, and now they’re willing to give up their nukes, as we’ve been hearing about.

But the way the North Koreans use denuclearization is very important. Because on one hand, that could mean that they’re willing to give up their nuclear weapons, and to denuclearize the state itself, but the way the North Koreans use it is much broader. It’s more used in the way of denuclearizing the peninsula. It’s not specifically reflective onto them.

Now that they’ve finally achieved some type of reasonable success with their nuclear weapons program, they’re more in a position where they think they can talk to the United States as equals, and denuclearization falls into the terminology that it’s used by other nuclear weapons states, where it’s a, “In a better world we won’t need these types of horrible weapons, but we don’t live in that world today, so we will stand behind the effort to denuclearize, but not right now.”

Melissa: Yeah, I think we can say that if we look at North Korea’s capabilities first, and then why they’re talking now, we can see that in the time when Dave and I were cutting our teeth, they were really ramping up their nuclear and missile capabilities. It wasn’t immediately obvious, because a lot of what was happening was inside a laboratory or inside a building, but then eventually they started doing nuclear tests and then they did more and more missile tests.

It used to be that a missile test was just a short range missile off the coast, sometimes it was a political grandstanding. But if you look, our colleague, Shea Cotton, made a missile database that shows every North Korean missile test, and you can see that in the time under Kim Jong-un, those tests really started to ramp up. I think Dave, you started at CNS in like 2014?

Dave: Right around then.

Melissa: Right around then, so they jumped up to like 19 missile tests that year. I can say this because I’m looking at the database right now, and they started doing really more interesting things than ever before, too. Even though diplomatically and politically we were still thinking of them as being backwards, as not having a very good capability, if we looked at it quantitatively, we could say, “Well, they’re really working on something.”

So Dave actually was really excellent at geolocating. When they did engine tests, we could measure the bell of the engine and get a sense of what those engines were about. We could see solid fuel motors being tested, and so this went all the way up until ICBM launched last fall, and then they were satisfied.

Ariel: So when you say engine testing, what does that mean? What engine?

Dave: The North Korean ballistic missile fleet used to be entirely tied to this really old Soviet missile called the Scud. If anyone’s played video games in the late ’90s or early 2000s, that was the small missile that you always had to take out or something along that line, and it was fairly primitive. It was a design that the North Koreans hadn’t demonstrated they were able to move beyond, that’s why then the last three years started to kick in, the North Koreans started to field more complicated missiles instead of showing that they were doing engine tests with more experimental, more advanced designs that we had seen in other parts of the world previously. Some people were a bit speculative or doubting that the North Koreans were actually making serious progress. Then last year, they tested their first intermediate range ballistic missile which can hit Guam, which is something that they’ve been trying to do for a while, but it hadn’t worked out. Then, they made that missile larger, they made their first ICBM.

Then they made that missile even larger, came up with a much more ambitious engine design using two engines instead of one. They had a much more advanced steering system, and they came up with the Hwasong-15 which is their longest range ICBM. It’s a huge shift from the way we were having this conversation 5 to 10 years ago, where we were looking at their space launch vehicles, which were, again, modified Scuds that were stretched out and essentially tied together, to an actual functioning ICBM fleet.

The technological shift in pair with their nuclear weapons developments have really demonstrated that the North Koreans are no longer this 10 to 20 year, around the corner threat, that they actually possess the ability to launch nuclear weapons at the United States.

Melissa: And back when they had their first nuclear test in 2006, people were like, “It’s a device.” I think for years, we still call it a device. But back then, the US and others kept moving the goalposts. They were saying, “Well, all right. They had a nuclear device explode. We don’t know how big it was, they have no way of delivering it. We don’t know what the yield was. It probably fizzled.” It was dismissive.

So, from that period, 2006 to today, it’s a real remarkable challenge. Almost every criticism that North Korea has faced, right down to their heat shield on their ICBM, has been addressed vociferously with propaganda, photos and videos that we in turn can analyze. And yeah, I think they have demonstrated essentially that they can explode something, they can launch a missile that can carry something that can explode.

The only thing they haven’t done, and Dave can chime in here, is explode a nuclear weapon on the tip of a missile. Other countries have done this, and it’s terrifying, and because Dave is such a geographically visual person, I’ll let him describe what that might look like. But if we keep goading them, if we keep telling them they’re backwards, eventually they’re going to want to prove it.

Dave: Yeah, so off of Melissa’s point, this is something that I believe Jeffrey might have coined. It’s called the Juche Bird, which is a playoff of Frigate Bird, which was a live nuclear warhead test that the Americans conducted. The North Koreans, in order to prove that the system in its entirety — the nuclear device, the missile, the reentry shield — all work and it’s not just small random successes in different parts of a much larger program, is they would take a live nuclear weapon, put it on the end of a long range missile, launch it in the air, and detonate it at a specific location to show that they have the ability to actually use the purported weapon system.

Melissa: So if you’re sitting in Japan or South Korea, but especially Japan, and you imagine North Korea launching an intermediate range or intercontinental ballistic missile over your country, with a nuclear weapon on it, in order to execute an atmospheric test, that makes you extremely nervous. Extremely nervous, and we all should be a little bit nervous, because it’s really hard for anyone in the open source, and I would argue in the intelligence community, to know, “Well, this is just an atmospheric test. This isn’t the beginning of a war.”

We would have to trust that they pick up the trajectory of that missile really fast and determine that it’s not heading anywhere. That’s the challenge with all of these missile tests, is no one can tell if there’s a warhead on it, or not a warhead on it, and then we start playing games with ballistic missile defense, and that is a whole new can of worms.

Ariel: What do you guys think is the risk that North Korea or any other country for that matter, would intentionally launch a nuclear weapon at another country?

Melissa: For me, it’s accidents, and an accident can unfold a couple of different ways. One way would be perhaps the US is performing joint exercises. North Korea has some sensing equipment up on peaks of mountains, and Dave has found every single one probably, but it’s not perfect. It’s not great, and if the picture comes back to them, it’s a little fuzzy, maybe this is no longer a joint exercise. This is the beginning of an attack. They will decide to engage.

They’ve long said that they believe that a war will start based on the pretext of a joint exercise. In reverse scenario, what if North Korea does launch an ICBM with a nuclear warhead, in order to perform a test, and the US or Japan or South Korea think, “Well, this is it. This is the war.” And so it’s those accidental scenarios that I worry about, or even perhaps what happens if a test goes badly? Or, someone is harmed in some way?

I worry that these states would have a hard time politically rolling back where they feel they have to be, based on these high stakes.

Dave: I agree with Melissa. I think the highest risk we have is also depending on our nuclear posture in accident. There have been accidents that have happened in the past where someone in a monitoring base picks up a bunch of bleeps on a radar, and people start initiating the game on protocol, and luckily we’ve been able to avoid that to its completion in the past.

Now, with the North Koreans, this could also work in their direction, as well. I can’t imagine that their sensing technology is up to par with what the United States has, or had, back when these accidents were a real thing and they happened. So if the North Koreans see a military exercise that they don’t feel comfortable with, or they have some type of technical glitch on their side, they might notionally launch something, and that would be the start of a conflict.

Ariel: One of the final questions that I have for both of you. I’ve read that while nuclear weapons are scary, the greater threat with North Korea could actually be their conventional weapons. Could either of you speak to that?

Dave: Yeah, sure. North Korea has a very large conventional army. Some people might try to make jokes about how modern that army is, but military force only needs to be so modern with the type of geographical game that’s in play on the Korean Peninsula. Seoul is really not that far from the DMZ, and it’s a widely known fact that North Korea has tons of artillery pointed at Seoul. They’ve had these things pointed there since the end of the Korean War, and they’re all entrenched.

You might be able to hit some of them, but you’re not going to hit all of them. This type of artillery, in connection with their conventional ballistic missile force, we’re talking about things that aren’t carrying a WMD, it’s a real big threat for some type of conventional action.

Seoul is a huge city. The metropolitan area at least has a population of over 20 million people. I’m not sure if you’ve ever been to Seoul, it’s a great, beautiful city, but traffic is horrible, and if everyone’s trying to leave the city when something happens, everyone north of the river is screwed, and congestion on the south side, it would just be a total disaster. Outside of the whole nuclear aspect of this dangerous relationship, the conventional forces North Korea has are equally as terrifying.

Melissa: I think Dave’s bang on, but the only thing I would add is that one of the things that’s concerning about having both nuclear and conventional forces is how you use your conventional forces with that extra nuclear guarantee. This is something that our boss, Jeffrey Lewis, has written about extensively. But do you use that extra measure of security and just preserve it, save it? Does Kim Jong-un go home at night to his family and say, “Yes, I feel extra safe today because I have my nuclear security?”

Or do you use that extra nuclear security in order to increase the number of provocations that you do conventionally? Because we’ve had theses crises break out over the sinking of the Cheonan naval vessel, or the shelling of Yeonpyeong, near the border. In both cases, South Koreans died, but the question is will North Korea feel emboldened by its nuclear security, and will it carry out more conventional provocations?

Ariel: Okay, and so for the last question that I want to ask, we’ve talked about all these things that could go wrong, and there’s really just never anything that positive about a nuclear weapons discussion, but I still want to end with is there anything that gives you hope about this situation?

Dave: That’s a tough question. I mean, on one side, we have a nuclear armed North Korea, and this is something that we knew was coming for quite some time. I think if anything, this is one thing that I know I have and I believe Melissa has been advocating as well, is conversation and dialogue between North and all the other associated parties, including the United States, is a way to begin some type of line of communication, hopefully so that accidents don’t happen.

‘Cause North Korea’s not going to be giving up their nukes anytime soon. Even though the talks that you may be having aren’t going to be as productive as you would want them to be, I believe conversation is critical at this moment, because the other alternatives are pretty bad.

Melissa: I guess I’ll add on that we have Dave now, and I know it sounds like I’m teasing my colleague, but it’s true. Things are bad, things are bad, but we’re turning out generation after generation of young, brilliant, enthusiastic people. Before 2014, we didn’t have a Dave, and now we have a Dave, and Dave is making more Daves, and every year we’re matriculating students who care about this issue, who are finding new ways to engage with this issue, that are disrupting entrenched thinking on this issue.

Nuclear weapons are old. They are scary, they are the biggest explosion that humans have ever made, but they are physical and finite, and the technology is aging, and I do think with new creative, engaging ways, the next generation’s going to come along and they’re going to be able to address this issue with new hacks. These can be technical hacks, they can be along the side of verification and trust building. These can be diplomatic hacks.

The grassroots movements we see all around the world, that are taking place to ban nuclear weapons, those are largely motivated by young people. I’m on this bridge where I get to see… I remember the Berlin Wall coming down, I also get to see the students who don’t remember 9/11, and it’s a nice vantage point to be able to see how history’s changing, and while it feels very scary and dark in this moment, in this administration, we’ve been in dark administrations before. We’ve faced much more terrifying adversaries than North Korea, and I think it’s going to be generations ahead who are going to help crack this problem.

Ariel: Excellent. That was a really wonderful answer. Thank you. Well, thank you both so much for being here today. I’ve really enjoyed talking with you.

Melissa: Thanks for having us.

Dave: Yeah, thanks for having us on.

Ariel: For listeners, as I mentioned earlier, we will have links to anything we discussed on the podcast in the transcript of the podcast, which you can find from the homepage of So, thanks again for listening, like the podcast if you enjoyed it, subscribe to hear more, and we will be back again next month.


Teaching Today’s AI Students To Be Tomorrow’s Ethical Leaders: An Interview With Yan Zhang

Some of the greatest scientists and inventors of the future are sitting in high school classrooms right now, breezing through calculus and eagerly awaiting freshman year at the world’s top universities. They may have already won Math Olympiads or invented clever, new internet applications. We know these students are smart, but are they prepared to responsibly guide the future of technology?

Developing safe and beneficial technology requires more than technical expertise — it requires a well-rounded education and the ability to understand other perspectives. But since math and science students must spend so much time doing technical work, they often lack the skills and experience necessary to understand how their inventions will impact society.

These educational gaps could prove problematic as artificial intelligence assumes a greater role in our lives. AI research is booming among young computer scientists, and these students need to understand the complex ethical, governance, and safety challenges posed by their innovations.



In 2012, a group of AI researchers and safety advocates – Paul Christiano, Jacob Steinhardt, Andrew Critch, Anna Salamon, and Yan Zhang – created the Summer Program in Applied Rationality and Cognition (SPARC) to address the many issues that face quantitatively strong teenagers, including the issue of educational gaps in AI. As with all technologies, they explain, the more the AI community consists of thoughtful, intelligent, broad-minded reasoners, the more likely AI is to be developed in a safe and beneficial manner.

Each summer, the SPARC founders invite 30-35 mathematically gifted high school students to participate in their two-week program. Zhang, SPARC’s director, explains: “Our goals are to generate a strong community, expose these students to ideas that they’re not going to get in class – blind spots of being a quantitatively strong teenager in today’s world, like empathy and social dynamics. Overall we want to make them more powerful individuals who can bring positive change to the world.”

To help students make a positive impact, SPARC instructors teach core ideas in effective altruism (EA). “We have a lot of conversations about EA, but we don’t push the students to become EA,” Zhang says. “We expose them to good ideas, and I think that’s a healthier way to do mentorship.”

SPARC also exposes students to machine learning, AI safety, and existential risks. In 2016 and 2017, they held over 10 classes on these topics, including: “Machine Learning” and “Tensorflow” taught by Jacob Steinhardt, “Irresponsible Futurism” and “Effective Do-Gooding” taught by Paul Christiano, “Optimization” taught by John Schulman, and “Long-Term Thinking on AI and Automization” taught by Michael Webb.

But SPARC instructors don’t push students down the AI path either. Instead, they encourage students to apply SPARC’s holistic training to make a more positive impact in any field.


Thinking on the Margin: The Role of Social Skills

Making the most positive impact requires thinking on the margin, and asking: What one additional unit of knowledge will be most helpful for creating positive impact? For these students, most of whom have won Math and Computing Olympiads, it’s usually not more math.

“A weakness of a lot of mathematically-minded students are things like social skills or having productive arguments with people,” Zhang says. “Because to be impactful you need your quantitative skills, but you need to also be able to relate with people.”

To counter this weakness, he teaches classes on social skills and signaling, and occasionally leads improvisational games. SPARC still teaches a lot of math, but Zhang is more interested in addressing these students’ educational blind spots – the same blind spots that the instructors themselves had as students. “What would have made us more impactful individuals, and also more complete and more human in many ways?” he asks.

Working with non-math students can help, so Zhang and his colleagues have experimented with bringing excellent writers and original thinkers into the program. “We’ve consistently had really good successes with those students, because they bring something that the Math Olympiad kids don’t have,” Zhang says.

SPARC also broadens students’ horizons with guest speakers from academia and organizations such as the Open Philanthropy Project, OpenAI, Dropbox and Quora. In one talk, Dropbox engineer Albert Ni spoke to SPARC students about “common mistakes that math people make when they try to do things later in life.”

In another successful experiment suggested by Ofer Grossman, a SPARC alum who is now a staff member, SPARC made half of all classes optional in 2017. The classes were still packed because students appreciated the culture. The founders also agreed that conversations after class are often more impactful than classes, and therefore engineered one-on-one time and group discussions into the curriculum. Thinking on the margin, they ask: “What are the things that were memorable about school? What are the good parts? Can we do more of those and less of the others?”

Above all, SPARC fosters a culture of openness, curiosity and accountability. Inherent in this project is “cognitive debiasing” – learning about common biases like selection bias and confirmation bias, and correcting for them. “We do a lot of de-biasing in our interactions with each other, very explicitly,” Zhang says. “We also have classes on cognitive biases, but the culture is the more important part.”


AI Research and Future Leaders

Designing safe and beneficial technology requires technical expertise, but in SPARC’s view, cultivating a holistic research culture is equally important. Today’s top students may make some of the most consequential AI breakthroughs in the future, and their values, education and temperament will play a critical role in ensuring that advanced AI is deployed safely and for the common good.

“This is also important outside of AI,” Zhang explains. “The official SPARC stance is to make these students future leaders in their communities, whether it’s AI, academia, medicine, or law. These leaders could then talk to each other and become allies instead of having a bunch of splintered, narrow disciplines.”

As SPARC approaches its 7th year, some alumni have already begun to make an impact. A few AI-oriented alumni recently founded AlphaSheets – a collaborative, programmable spreadsheet for finance that is less prone to error – while other students are leading a “hacker house” with people in Silicon Valley. Additionally, SPARC inspired the creation of ESPR, a similar European program explicitly focused on AI risk.

But most impacts will be less tangible. “Different pockets of people interested in different things have been working with SPARC’s resources, and they’re forming a lot of social groups,” Zhang explains. “It’s like a bunch of little sparks and we don’t quite know what they’ll become, but I’m pretty excited about next five years.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

ICRAC Open Letter Opposes Google’s Involvement With Military

From improving medicine to better search engines to assistants that help ease busy schedules, artificial intelligence is already proving a boon to society. But just as it can be designed to help, it can be designed to harm and even to kill.

Military uses of AI can also run the gamut from programs that could help improve food distribution logistics to weapons that can identify and assassinate targets without input from humans. Because AI programs can have these dual uses, it’s difficult for companies who do not want their technology to cause harm to work with militaries – it’s not currently possible for a company to ensure that if it helps the military solve a benign problem with an AI program that the program won’t later be repurposed to take human lives.

So when employees at Google learned earlier this year about the company’s involvement in the Pentagon’s Project Maven, they were upset. Though Google argues that their work on Project Maven only assisted the U.S. military with image recognition tools from drone footage, many suggest that this technology could later be used for harm. In response, over 3,000 employees signed an open letter saying they did not want their work to be used to kill.

And it isn’t just Google’s employees who are concerned.

Earlier this week, the International Committee for Robot Arms Control released an open letter signed by hundreds of academics calling on Google’s leadership to withdraw from the “business of war.” The letter, which is addressed to Google’s leadership, responds to the growing criticism of Google’s participation in the Pentagon’s program, Project Maven.

The letter states, “we write in solidarity with the 3100+ Google employees, joined by other technology workers, who oppose Google’s participation in Project Maven.” It goes on to remind Google leadership to be cognizant of the incredible responsibility the company has for safeguarding the data it’s collected from its users, as well as its famous motto, “Don’t Be Evil.”

Specifically, the letter calls on Google to:

  • “Terminate its Project Maven contract with the DoD.
  • “Commit not to develop military technologies, nor to allow the personal data it has collected to be used for military operations.
  • “Pledge to neither participate in nor support the development, manufacture, trade or use of autonomous weapons; and to support efforts to ban autonomous weapons.”

Lucy Suchman, one of the letter’s authors, explained part of her motivation for her involvement:

“For me the greatest concern is that this effort will lead to further reliance on profiling and guilt by association in the US drone surveillance program, as the only way to generate signal out of the noise of massive data collection. There are already serious questions about the legality of targeted killing, and automating it further will only make it less accountable.”

The letter was released the same week that a small group of Google employees made news for resigning in protest against Project Maven. It also comes barely a month after a successful boycott by academic researchers against KAIST’s autonomous weapons effort.

In addition, last month the United Nations held their most recent meeting to consider a ban on lethal autonomous weapons. 26 countries, including China, have now said they would support some sort of official ban on these weapons.

In response to the number of signatories the open letter has received, Suchman added, “This is clearly an issue that strikes a chord for many researchers who’ve been tracking the incorporation of AI and robotics into military systems.”

If you want to add your name to the letter, you can do so here.