AI Researchers Create Video to Call for Autonomous Weapons Ban at UN

In response to growing concerns about autonomous weapons, a coalition of AI researchers and advocacy organizations released a fictitious video on Monday that depicts a disturbing future in which lethal autonomous weapons have become cheap and ubiquitous.

The video was launched in Geneva, where AI researcher Stuart Russell presented it at an event at the United Nations Convention on Conventional Weapons hosted by the Campaign to Stop Killer Robots.

Russell, in an appearance at the end of the video, warns that the technology described in the film already exists and that the window to act is closing fast.

Support for a ban has been mounting. Just this past week, over 200 Canadian scientists and over 100 Australian scientists in academia and industry penned open letters to Prime Minister Justin Trudeau and Malcolm Turnbull urging them to support the ban. Earlier this summer, over 130 leaders of AI companies signed a letter in support of this week’s discussions. These letters follow a 2015 open letter released by the Future of Life Institute and signed by more than 20,000 AI/Robotics researchers and others, including Elon Musk and Stephen Hawking.

These letters indicate both grave concern and a sense that the opportunity to curtail lethal autonomous weapons is running out.

Noel Sharkey of the International Committee for Robot Arms Control explains, “The Campaign to Stop Killer Robots is not trying to stifle innovation in artificial intelligence and robotics and it does not wish to ban autonomous systems in the civilian or military world. Rather we see an urgent need to prevent automation of the critical functions for selecting targets and applying violent force without human deliberation and to ensure meaningful human control for every attack.”

Drone technology today is very close to having fully autonomous capabilities. And many of the world’s leading AI researchers worry that if these autonomous weapons are ever developed, they could dramatically lower the threshold for armed conflict, ease and cheapen the taking of human life, empower terrorists, and create global instability. The US and other nations have used drones and semi-automated systems to carry out attacks for several years now, but fully removing a human from the loop is at odds with international humanitarian and human rights law.

A ban can exert great power on the trajectory of technological development without needing to stop every instance of misuse. Max Tegmark, MIT Professor and co-founder of the Future of Life Institute, points out, “People’s knee-jerk reaction that bans can’t help isn’t historically accurate: the bioweapon ban created such a powerful stigma that, despite treaty cheating, we have almost no bioterror attacks today and almost all biotech funding is civilian.”

As Toby Walsh, an AI professor at the University of New South Wales, argues: “The academic community has sent a clear and consistent message. Autonomous weapons will be weapons of terror, the perfect tool for those who have no qualms about the terrible uses to which they are put. We need to act now before this future arrives.”

More than 70 countries are participating in the meeting taking place November 13 – 17 organized by the 2016 Fifth Review Conference at the UN, which established a Group of Governmental Experts on lethal autonomous weapons. The meeting is chaired by Ambassador Amandeep Singh Gill of India, and the countries will continue negotiations of what could become an historic international treaty.

For more information about autonomous weapons, see the following resources:

Developing Ethical Priorities for Neurotechnologies and AI

Private companies and military sectors have moved beyond the goal of merely understanding the brain to that of augmenting and manipulating brain function. In particular, companies such as Elon Musk’s Neuralink and Bryan Johnson’s Kernel are hoping to harness advances in computing and artificial intelligence alongside neuroscience to provide new ways to merge our brains with computers.

Musk also sees this as a means to help address both AI safety and human relevance as algorithms outperform humans in one area after another. He has previously stated, “Some high bandwidth interface to the brain will be something that helps achieve a symbiosis between human and machine intelligence and maybe solves the control problem and the usefulness problem.”

In a comment in Nature, 27 people from The Morningside Group outlined four ethical priorities for the emerging space of neurotechnologies and artificial intelligence. The authors include neuroscientists, ethicists and AI engineers from Google, top US and global Universities, and several non-profit research organizations such as AI Now and The Hastings Center.

A Newsweek article describes their concern, “Artificial intelligence could hijack brain-computer interfaces and take control of our minds.” While this is not exactly the warning the Group describes, they do suggest we are in store for some drastic changes:

…we are on a path to a world in which it will be possible to decode people’s mental processes and directly manipulate the brain mechanisms underlying their intentions, emotions and decisions; where individuals could communicate with others simply by thinking; and where powerful computational systems linked directly to people’s brains aid their interactions with the world such that their mental and physical abilities are greatly enhanced.

The authors suggest that although these advances could provide meaningful and beneficial enhancements to the human experience, they could also exacerbate social inequalities, enable more invasive forms of social manipulation, and threaten core fundamentals of what it means to be human. They encourage readers to consider the ramifications of these emerging technologies now.

Referencing the Asilomar AI Principles and other ethical guidelines as a starting point, they call for a new set of guidelines that specifically address concerns that will emerge as groups like Elon Musk’s startup Neuralink and other companies around the world explore ways to improve the interface between brains and machines. Their recommendations cover four key areas: privacy and consent; agency and identity; augmentation; and bias.

Regarding privacy and consent, they posit that the right to keep neural data private is critical. To this end, they recommend opt-in policies, strict regulation of commercial entities, and the use of blockchain-based techniques to provide transparent control over the use of data. In relation to agency and identity, they recommend that bodily and mental integrity, as well as the ability to choose our actions, be enshrined in international treaties such as the Universal Declaration of Human Rights.

In the area of augmentation, the authors discuss the possibility of an augmentation arms race of soldiers in the pursuit of so-called “super-soldiers” that are more resilient to combat conditions. They recommend that the use of neural technology for military purposes be stringently regulated. And finally, they recommend the exploration of countermeasures, as well as diversity in the design process, in order to prevent widespread bias in machine learning applications.

The ways in which AI will increasingly connect with our bodies and brains pose challenging safety and ethical concerns that will require input from a vast array of people. As Dr. Rafael Yuste of Columbia University, a neuroscientist who co-authored the essay, told STAT, “the ethical thinking has been insufficient. Science is advancing to the point where suddenly you can do things you never would have thought possible.”

MIRI’s November 2017 Newsletter

Eliezer Yudkowsky has written a new book on civilizational dysfunction and outperformance: Inadequate Equilibria: Where and How Civilizations Get Stuck. The full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visit equilibriabook.com.

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. Inadequacy and Modesty (discussion: LessWrong, EA Forum, Hacker News)
  2. An Equilibrium of No Free Energy (discussion: LessWrong, EA Forum)

Research updates

General updates

News and links

Podcast: AI Ethics, the Trolley Problem, and a Twitter Ghost Story with Joshua Greene and Iyad Rahwan

As technically challenging as it may be to develop safe and beneficial AI, this challenge also raises some thorny questions regarding ethics and morality, which are just as important to address before AI is too advanced. How do we teach machines to be moral when people can’t even agree on what moral behavior is? And how do we help people deal with and benefit from the tremendous disruptive change that we anticipate from AI?

To help consider these questions, Joshua Greene and Iyad Rawhan kindly agreed to join the podcast. Josh is a professor of psychology and member of the Center for Brain Science Faculty at Harvard University, where his lab has used behavioral and neuroscientific methods to study moral judgment, focusing on the interplay between emotion and reason in moral dilemmas. He’s the author of Moral Tribes: Emotion, Reason and the Gap Between Us and Them. Iyad is the AT&T Career Development Professor and an associate professor of Media Arts and Sciences at the MIT Media Lab, where he leads the Scalable Cooperation group. He created the Moral Machine, which is “a platform for gathering human perspective on moral decisions made by machine intelligence.”

In this episode, we discuss the trolley problem with autonomous cars, how automation will affect rural areas more than cities, how we can address potential inequality issues AI may bring about, and a new way to write ghost stories.

This transcript has been heavily edited for brevity. You can read the full conversation here.

Ariel: How do we anticipate that AI and automation will impact society in the next few years?

Iyad: AI has the potential to extract better value from the data we’re collecting from all the gadgets, devices and sensors around us. We could use this data to make better decisions, whether it’s micro-decisions in an autonomous car that takes us from A to B safer and faster, or whether it’s medical decision-making that enables us to diagnose diseases better, or whether it’s even scientific discovery, allowing us to do science more effectively, efficiently and more intelligently.

Joshua: Artificial intelligence also has the capacity to displace human value. To take the example of using artificial intelligence to diagnose disease. On the one hand it’s wonderful if you have a system that has taken in all of the medical knowledge we have in a way that no human could and uses it to make better decisions. But at the same time that also means that lots of doctors might be out of a job or have a lot less to do. This is the double-edged sword of artificial intelligence, the value it creates and the human value that it displaces.

Ariel: Can you explain what the trolley problem is and how does that connect to this question of what do autonomous vehicles do in situations where there is no good option?

Joshua: One of the original versions of the trolley problem goes like this (we’ll call it “the switch case”): A trolley is headed towards five people and if you don’t do anything, they’re going to be killed, but you can hit a switch that will turn the trolley away from the five and onto a side track. However on that side track, there’s one unsuspecting person and if you do that, that person will be killed.

The question is: is it okay to hit the switch to save those five people’s lives but at the cost of saving one life? In this case, most people tend to say yes. Then we can vary it a little bit. In “the footbridge case,” the situation is different as follows: the trolley is now headed towards five people on a single track, over that track is a footbridge and on that footbridge is a large person wearing a very large backpack. You’re also on the bridge and the only way that you can save those five people from being hit by the trolley is to push that big person off of the footbridge and onto the tracks below.

Assume that it will work, do you think it’s okay to push the guy off the footbridge in order to save five lives? Here, most people say no, and so we have this interesting paradox. In both cases, you’re trading one life for five, yet in one case it seems like it’s the right thing to do, in the other case it seems like it’s the wrong thing to do.

One of the classic objections to these dilemmas is that they’re unrealistic. My view is that the point is not that they’re realistic, but instead that they function like high contrast stimuli. If you’re a vision researcher and you’re using flashing black and white checkerboards to study the visual system, you’re not using that because that’s a typical thing that you look at, you’re using it because it’s something that drives the visual system in a way that reveals its structure and dispositions.

In the same way, these high contrast, extreme moral dilemmas can be useful to sharpen our understanding of the more ordinary processes that we bring to moral thinking.

Iyad: The trolley problem can translate in a cartoonish way to a scenario with which an autonomous car is faced with only two options. The car is going at a speed limit on a street and due to mechanical failure is unable to stop and is going to hit it a group of five pedestrians. The car can swerve and hit a bystander. Should the car swerve or should it just plow through the five pedestrians?

This has a structure similar to the trolley problem because you’re making similar tradeoffs between one and five people and the decision is not being taken on the spot, it’s actually happening at the time of the programming of the car.

There is another complication in which the person being sacrificed to save the greater number of people is the person in the car. Suppose the car can swerve to avoid the five pedestrians but as a result falls off a cliff. That adds another complication especially that programmers are going to have to appeal to customers. If customers don’t feel safe in those cars because of some hypothetical situation that may take place in which they’re sacrificed, that pits the financial incentives against the potentially socially desirable outcome, which can create problems.

A question that raises itself is: Is it going to ever happen? How many times do we face these kinds of situations as we drive today? So the argument goes: these situations are going to be so rare that they are irrelevant and that autonomous cars promise to be substantially safer than human-driven cars that we have today, that the benefits significantly outweigh the costs.

There is obviously truth to this argument, if you take the trolley problem scenario literally. But what the autonomous car version of the trolley problem is doing, is it’s abstracting the tradeoffs that are taking place every microsecond, even now.

Imagine you’re driving on the road and there is a large truck on the lane to your left and as a result you choose to stick a little bit further to the right, just to minimize risk in case this car gets off its lane. Now suppose that there could be a cyclist later on the right hand side, what you’re effectively doing in this small maneuver is slightly reducing risk to yourself but slightly increasing risk to the cyclist. These sorts of decisions are being made millions and millions of times every day.

Ariel: Applying the trolley problem to self-driving cars seems to be forcing the vehicle and thus the programmer of the vehicle to make a judgment call about whose life is more valuable. Can we not come up with some other parameters that don’t say that one person’s life is more valuable than someone else’s?

Joshua: I don’t think that there’s any way to avoid doing that. If you’re a driver, there’s no way to avoid answering the question, how cautious or how aggressive am I going to be. You can not explicitly answer the question; you can say I don’t want to think about that, I just want to drive and see what happens. But you are going to be implicitly answering that question through your behavior, and in the same way, autonomous vehicles can’t avoid the question. Either the people who are designing the machines, training the machines or explicitly programming to behave in certain ways, they are going to do things that are going to affect the outcome.

The cars will constantly be making decisions that inevitably involve value judgments of some kind.

Ariel: To what extent have we actually asked customers what it is that they want from the car? In a completely ethical world, I would like the car to protect the person who’s more vulnerable, who would be the cyclist. In practice, I have a bad feeling I’d probably protect myself.

Iyad: We could say we want to treat everyone equally. On the other hand, you have this self-protective instinct which presumably as a consumer, that’s what you want to buy for yourself and your family. On the other hand you also care for vulnerable people. Different reasonable and moral people can disagree on what the more important factors and considerations should be and I think this is precisely why we have to think about this problem explicitly, rather than leave it purely to – whether it’s programmers or car companies or any particular single group of people – to decide.

Joshua: When we think about problems like this, we have a tendency to binarize it, but it’s not a binary choice between protecting that person or not. It’s really going to be matters of degree. Imagine there’s a cyclist in front of you going at cyclist speed and you either have to wait behind this person for another five minutes creeping along much slower than you would ordinarily go, or you have to swerve into the other lane where there’s oncoming traffic at various distances. Very few people might say I will sit behind this cyclist for 10 minutes before I would go into the other lane and risk damage to myself or another car. But very few people would just blow by the cyclist in a way that really puts that person’s life in peril.

It’s a very hard question to answer because the answers don’t come in the form of something that you can write out in a sentence like, “give priority to the cyclist.” You have to say exactly how much priority in contrast to the other factors that will be in play for this decision. And that’s what makes this problem so interesting and also devilishly hard to think about.

Ariel: Why do you think this is something that we have to deal with when we’re programming something in advance and not something that we as a society should be addressing when it’s people driving?

Iyad: We very much value the convenience of getting from A to B. Our lifetime odds of dying from a car accident is more than 1%, yet somehow, we’ve decided to put up with this because of the convenience. As long as people don’t run through a red light or are not drunk, you don’t really blame them for fatal accidents, we just call them accidents.

But now, thanks to autonomous vehicles that can make decisions and reevaluate situations hundreds or thousands of times per second and adjust their plan and so on – we potentially have the luxury to make those decisions a bit better and I think this is why things are different now.

Joshua: With the human we can say, “Look, you’re driving, you’re responsible, and if you make a mistake and hurt somebody, you’re going to be in trouble and you’re going to pay the cost.” You can’t say that to a car, even a car that’s very smart by 2017 standards. The car isn’t going to be incentivized to behave better – the motivation has to be explicitly trained or programmed in.

Iyad: Economists say you can incentivize the people who make the cars to program them appropriately by fining them and engineering the product liability law in such a way that would hold them accountable and responsible for damages, and this may be the way in which we implement this feedback loop. But I think the question remains what should the standards be against which we hold those cars accountable.

Joshua: Let’s say somebody says, “Okay, I make self-driving cars and I want to make them safe because I know I’m accountable.” They still have to program or train the car. So there’s no avoiding that step, whether it’s done through traditional legalistic incentives or other kinds of incentives.

Ariel: I want to ask about some other research you both do. Iyad you look at how AI and automation impact us and whether that could be influenced by whether we live in smaller towns or larger cities. Can you talk about that?

Iyad: Clearly there are areas that may potentially benefit from AI because it improves productivity and it may lead to greater wealth, but it can also lead to labor displacement. It could cause unemployment if people aren’t able to retool and improve their skills so that they can work with these new AI tools and find employment opportunities.

Are we expected to experience this in a greater way or in a smaller magnitude in smaller versus bigger cities? On one hand there are lots of creative jobs in big cities and, because creativity is so hard to automate, it should make big cities more resilient to these shocks. On the other hand if you go back to Adam Smith and the idea of the division of labor, the whole idea is that individuals become really good at one thing. And this is precisely what spurred urbanization in the first industrial revolution. Even though the system is collectively more productive, individuals may be more automatable in terms of their narrowly-defined tasks.

But when we did the analysis, we found that indeed larger cities are more resilient in relative terms. The preliminary findings are that in bigger cities there is more production that requires social interaction and very advanced skills like scientific and engineering skills. People are better able to complement the machines because they have technical knowledge, so they’re able to use new intelligent tools that are becoming available, but they also work in larger teams on more complex products and services.

Ariel: Josh, you’ve done a lot of work with the idea of “us versus them.” And especially as we’re looking in this country and others at the political situation where it’s increasingly polarized along this line of city versus smaller town, do you anticipate some of what Iyad is talking about making the situation worse?

Joshua: I certainly think we should be prepared for the possibility that it will make the situation worse. The central idea is that as technology advances, you can produce more and more value with less and less human input, although the human input that you need is more and more highly skilled.

If you look at something like Turbo Tax, before you had lots and lots of accountants and many of those accountants are being replaced by a smaller number of programmers and super-expert accountants and people on the business side of these enterprises. If that continues, then yes, you have more and more wealth being concentrated in the hands of the people whose high skill levels complement the technology and there is less and less for people with lower skill levels to do. Not everybody agrees with that argument, but I think it’s one that we ignore at our peril.

Ariel: Do you anticipate that AI itself would become a “them,” or do you think it would be people working with AI versus people who don’t have access to AI?

Joshua: The idea of the AI itself becoming the “them,” I am agnostic as to whether or not that could happen eventually, but this would involve advances in artificial intelligence beyond anything we understand right now. Whereas the problem that we were talking about earlier – humans being divided into a technological, educated, and highly-paid elite as one group and then the larger group of people who are not doing as well financially – that “us-them” divide, you don’t need to look into the future, you can see it right now.

Iyad: I don’t think that the robot will be the “them” on their own, but I think the machines and the people who are very good at using the machines to their advantage, whether it’s economic or otherwise, will collectively be a “them.” It’s the people who are extremely tech savvy, who are using those machines to be more productive or to win wars and things like that. There would be some sort of evolutionary race between human-machine collectives.

Joshua: I think it’s possible that people who are technologically enhanced could have a competitive advantage and set off an economic arms race or perhaps even literal arms race of a kind that we haven’t seen. I hesitate to say, “Oh, that’s definitely going to happen.” I’m just saying it’s a possibility that makes a certain kind of sense.

Ariel: Do either of you have ideas on how we can continue to advance AI and address these divisive issues?

Iyad: There are two new tools at our disposal: experimentation and machine-augmented regulation.

Today, [there are] cars with a bull bar in front of them. These metallic bars at the front of the car increase safety for the passenger in the case of collision, but they have disproportionate impact on other cars, on pedestrians and cyclists, and they’re much more likely to kill them in the case of an accident. As a result, by making this comparison, by identifying that cars with bull bars are worse for certain group, the trade off was not acceptable, and many countries have banned them, for example the UK, Australia, and many European countries.

If there was a similar trade off being caused by a software feature, then, we wouldn’t know unless we allowed for experimentation as well as monitoring – if we looked at the data to identify whether a particular algorithm is making for very safe cars for customers, but at the expense of a particular group.

In some cases, these systems are going to be so sophisticated and the data is going to be so abundant that we won’t be able to observe them and regulate them in time. Think of algorithmic trading programs. No human being is able to observe these things fast enough to intervene, but you could potentially insert another algorithm, a regulatory algorithm or an oversight algorithm, that will observe other AI systems in real time on our behalf, to make sure that they behave.

Joshua: There are two general categories of strategies for making things go well. There are technical solutions to things and then there’s the broader social problem of having a system of governance that can be counted on to produce outcomes that are good for the public in general.

The thing that I’m most worried about is that if we don’t get our politics in order, especially in the United States, we’re not going to have a system in place that’s going to be able to put the public’s interest first. Ultimately, it’s going to come down to the quality of the government that we have in place, and quality means having a government that distributes benefits to people in what we would consider a fair way and takes care to make sure that things don’t go terribly wrong in unexpected ways and generally represents the interests of the people.

I think we should be working on both of these in parallel. We should be developing technical solutions to more localized problems where you need an AI solution to solve a problem created by AI. But I also think we have to get back to basics when it comes to the fundamental principles of our democracy and preserving them.

Ariel: As we move towards smarter and more ubiquitous AI, what worries you most and what are you most excited about?

Joshua: I’m pretty confident that a lot of labor is going to be displaced by artificial intelligence. I think it is going to be enormously politically and socially disruptive, and I think we need to plan now. With self-driving cars especially in the trucking industry, I think that’s going to be the first and most obvious place where millions of people are going to be out of work and it’s not going to be clear what’s going to replace it for them.

I’m excited about the possibility of AI producing value for people in a way that has not been possible before on a large scale. Imagine if anywhere in the world that’s connected to the Internet, you could get the best possible medical diagnosis for whatever is ailing you. That would be an incredible life-saving thing. And as AI teaching and learning systems get more sophisticated, I think it’s possible that people could actually get very high quality educations with minimal human involvement and that means that people all over the world could unlock their potential. And I think that that would be a wonderful transformative thing.

Iyad: I’m worried about the way in which AI and specifically autonomous weapons are going to alter the calculus of war. In order to aggress on another nation, you have to mobilize humans, you have to get political support from the electorate, you have to handle the very difficult process of bringing back people in coffins, and the impact that this has on electorates.

This creates a big check on power and it makes people think very hard about making these kinds of decisions. With AI, when you’re able to wage wars with very little loss to life, especially if you’re a very advanced nation that is at the forefront of this technology, then you have disproportionate power. It’s kind of like a nuclear weapon, but maybe more because it’s much more customizable. It’s not an all out or nothing – you could start all sorts of wars everywhere.

I think it’s going to be a very interesting shift in the way superpowers think about wars and I worry that this might make them trigger happy. I think a new social contract needs to be written so that this power is kept in check and that there’s more thought that goes into this.

On the other hand, I’m very excited about the abundance that will be created by AI technologies. We’re going to optimize the use of our resources in many ways. In health and in transportation, in energy consumption and so on, there are so many examples in recent years in which AI systems are able to discover ways in which even the smartest humans haven’t been able to optimize.

Ariel: One final thought: This podcast is going live on Halloween, so I want to end on a spooky note. And quite conveniently, Iyad’s group has created Shelley, which is a Twitter chatbot that will help you craft scary ghost stories. Shelley is, of course, a nod to Mary Shelley who wrote Frankenstein, which is the most famous horror story about technology. Iyad, I was hoping you could tell us a bit about how Shelley works.

Iyad: Yes, well this is our second attempt at doing something spooky for Halloween. Last year we launched the nightmare machine, which was using deep neural networks and style transfer algorithms to take ordinary photos and convert them into haunted houses and zombie-infested places. And that was quite interesting; it was a lot of fun. More recently, now we’ve launched Shelley, which people can visit on shelley.ai, and it is named after Mary Shelley who authored Frankenstein.

This is a neural network that generates text and it’s been trained on a very large data set of over 100 thousand short horror stories from a subreddit called No Sleep. And so it’s basically got a lot of human knowledge about what makes things spooky and scary, and the nice thing is that it generates part of the story and people can tweet back at it a continuation of the story and then basically take turns with the AI to craft stories. And we feature those stories on the website afterwards. if I’m correct, this is the first collaborative human-AI horror writing exercise ever.

Tokyo AI & Society Symposium

I just spent a week in Japan to speak at the inaugural symposium on AI & Society – my first conference in Asia. It was inspiring to take part in an increasingly global conversation about AI impacts, and interesting to see how the Japanese AI community thinks about these issues. Overall, Japanese researchers seemed more open to discussing controversial topics like human-level AI and consciousness than their Western counterparts. Most people were more interested in near-term AI ethics concerns but also curious about long term problems.

The talks were a mix of English and Japanese with translation available over audio (high quality but still hard to follow when the slides are in Japanese). Here are some tidbits from my favorite talks and sessions.

Danit Gal’s talk on China’s AI policy. She outlined China’s new policy report aiming to lead the world in AI by 2030, and discussed various advantages of collaboration over competition. It was encouraging to see that China’s AI goals include “establishing ethical norms, policies and regulations” and “forming robust AI safety and control mechanisms”. Danit called for international coordination to help ensure that everyone is following compatible concepts of safety and ethics.
danit_collage

Next breakthrough in AI panel (Yasuo Kuniyoshi from U Tokyo, Ryota Kanai from Araya and Marek Rosa from GoodAI). When asked about immediate research problems they wanted the field to focus on, the panelists highlighted intrinsic motivation, embodied cognition, and gradual learning. In the longer term, they encouraged researchers to focus on generalizable solutions and to not shy away from philosophical questions (like defining consciousness). I think this mindset is especially helpful for working on long-term AI safety research, and would be happy to see more of this perspective in the field.

Long-term talks and panel (Francesca Rossi from IBM, Hiroshi Nakagawa from U Tokyo and myself). I gave an overview of AI safety research problems in general and recent papers from my team. Hiroshi provocatively argued that a) AI-driven unemployment is inevitable, and b) we need to solve this problem using AI. Francesca talked about trustworthy AI systems and the value alignment problem. In the panel, we discussed whether long-term problems are a distraction from near-term problems (spoiler: no, both are important to work on), to what extent work on safety for current ML systems can carry over to more advanced systems (high-level insights are more likely to carry over than details), and other fun stuff.

Stephen Cave’s diagram of AI ethics issues. Helpfully color-coded by urgency.
stephen_cave_diagram

Luba Elliott’s talk on AI art. Style transfer has outdone itself with a Google Maps Mona Lisa.
google_maps_mona_lisa

There were two main themes I noticed in the Western presentations. People kept pointing out that AlphaGo is not AGI because it’s not flexible enough to generalize to hexagonal grids and such (this was before AlphaGo Zero came out). Also, the trolley problem was repeatedly brought up as a default ethical question for AI (it would be good to diversify this discussion with some less overused examples).

The conference was very well-organized and a lot of fun. Thanks to the organizers for bringing it together, and to all the great people I got to meet!

[This post originally appeared on the Deep Safety blog. Thanks to Janos Kramar for his feedback.]

Understanding Artificial General Intelligence — An Interview With Hiroshi Yamakawa

Click here to see this page in other languages : Japanese  

Artificial general intelligence (AGI) is something of a holy grail for many artificial intelligence researchers. Today’s narrow AI systems are only capable of specific tasks — such as internet searches, driving a car, or playing a video game — but none of the systems today can do all of these tasks. A single AGI would be able to accomplish a breadth and variety of cognitive tasks similar to that of people.

How close are we to developing AGI? How can we ensure that the power of AGI will benefit the world, and not just the group who develops it first? Will AGI become an existential threat for humanity, or an existential hope?

Dr. Hiroshi Yamakawa, Director of Dwango AI Laboratory, is one of the leading AGI researchers in Japan. Members of the Future of Life Institute sat down with Dr. Yamakawa and spoke with him about AGI and his lab’s progress in developing it. In this interview, Dr. Yamakawa explains how AI can model the human brain, his vision of a future where humans coexist with AGI, and why the Japanese think of AI differently than many in the West.

This transcript has been heavily edited for brevity. You can see the full conversation here.

Why did the Dwango Artificial Intelligence Laboratory make a large investment in [AGI]?

HY: Usable AI that has been developed up to now is essentially for solving specific areas or addressing a particular problem. Rather than just solving a number of problems using experience, AGI, we believe, will be more similar to human intelligence that can solve various problems which were not assumed in the design phase.

What is the advantage of the Whole Brain Architecture approach?

HY: The whole brain architecture is an engineering-based research approach “to create a human-like artificial general intelligence (AGI) by learning from the architecture of the entire brain.” Basically, this approach to building AGI is the integration of artificial neural networks and machine-learning modules while using the brain’s hard wiring as a reference.

I think it will be easier to create an AI with the same behavior and sense of values as humans this way. Even if superintelligence exceeds human intelligence in the near future, it will be comparatively easy to communicate with AI designed to think like a human, and this will be useful as machines and humans continue to live and interact with each other.

General intelligence is a function of many combined, interconnected features produced by learning, so we cannot manually break down these features into individual parts. Because of this difficulty, one meaningful characteristic of whole brain architecture is that though based on brain architecture, it is designed to be a functional assembly of parts that can still be broken down and used.

The functional parts of the brain are to some degree already present in artificial neural networks. It follows that we can build a roadmap of AGI based on these technologies as pieces and parts.

It is now said that convolutional neural networks have essentially outperformed the system/interaction between the temporal lobe and visual cortex in terms of image recognition tasks. At the same time, deep learning has been used to achieve very accurate voice recognition. In humans, the neocortex contains about 14 billion neurons, but about half of those can be partially explained with deep learning. From this point on, we need to come closer to simulating the functions of different structures of the brain, and even without the whole brain architecture, we need to be able to assemble several structures together to reproduce some behavioral level functions. Then, I believe, we’ll have a path to expand that development process to cover the rest of the brain functions, and finally integrate as whole brain..

You also started a non-profit, the Whole Brain Architecture Initiative. How does the non-profit’s role differ from the commercial work?

HY: The Whole Brain Architecture Initiative serves as an organization that helps promote whole brain AI architecture R&D as a whole.

The Basic Ideas of the WBAI:

  • Our vision is to create a world in which AI exists in harmony with humanity.
  • Our mission is to promote the open development of whole brain architecture.
    • In order to make human-friendly artificial general intelligence a public good for all of mankind, we seek to continually expand open, collaborative efforts to develop AI based on an architecture modeled after the brain.
  • Our values are Study, Imagine and Build.
    • Study: Deepen and spread our expertise.
    • Imagine: Broaden our views through public dialogue.
    • Build: Create AGI through open collaboration.

What do you think poses the greatest existential risk to global society in the 21st century?

HY: The risk is not just limited to AI; basically, as human scientific and technological abilities expand, and we become more empowered, risks will increase, too.

Imagine a large field where everyone only has weapons as dangerous as bamboo spears.  The risk that human beings would go extinct by killing each other is extremely small.  On the other hand, as technologies develop, we have bombs in a very small room and no matter who detonates the bomb, we approach a state of annihilation. That risk should concern everyone.

If there are only 10 people in the room, they will mutually monitor and trust each other. However, imagine trusting 10 billion people each with the ability to destroy everyone — such a scenario is beyond our ability to comprehend. Of course, technological development will advance not only offensive power but also defensive power, but it is not easy to have defensive power to contain attacking power at the same time. If scientific and technological development are promoted using artificial intelligence technology, for example, many countries will easily hold intercontinental ballistic fleets, and artificial intelligence can be extremely dangerous to living organisms by using nanotechnology. It could comprise a scenario to extinguish mankind by the development or use of dangerous substances.  Generally speaking, new offensive weapons are developed utilizing the progress of technology, and defensive weapons are developed to neutralize them. Therefore, it is inevitable that periods will exist where the offensive power needed to destroy humanity exceeds its defensive power.

What do you think is the greatest benefit that AGI can bring society?

HY: AGI’s greatest benefit comes from acceleration of development for science and technology. More sophisticated technology will offer solutions for global problems such as environmental issues, food problems and space colonization.

Here I would like to share my vision for the future: “In a desirable future, the happiness of all humans will be balanced against the survival of humankind under the support of superintelligence. In that future, society will be an ecosystem formed by augmented human beings and various public AIs, in what I dub ‘an ecosystem of shared intelligent agents’ (EcSIA).

“Although no human can completely understand EcSIA—it is too complex and vast—humans can control its basic directions. In implementing such control, the grace and wealth that EcSIA affords needs to be properly distributed to everyone.”

Assuming no global catastrophe halts progress, what are the odds of human level AGI in the next 10 years?

HY: I think there’s a possibility that it can happen soon, but taking the average of the estimates of people involved in WBAI, we came up with 2030.

In my current role as the editorial chairman for the Japanese Society of Artificial Intelligence (JSAI) journal, I’m promoting a plan to have a series of discussions starting in the July edition on the theme of “Singularity and AI,” in which we’ll have AI specialists discuss the singularity from a technical viewpoint. I want to help spread calm, technical views on the issue in this way, starting in Japan.

Once human level AGI is achieved, how long would you expect it to take for it to self-modify its way up to massive superhuman intelligence?

HY: If human-level AGI is achieved, it could take on the role of an AI researcher itself. Therefore, immediately after the AGI is built, it could start rapidly cultivating great numbers of AI researcher AI’s that work 24/7, and AI R&D would be drastically accelerated.

What probability do you assign to negative consequences as a result of badly done AI design or operation?

HY: If you include the risk of something like some company losing a lot of money, that will definitely happen.

The range of things that can be done with AI is becoming wider, and the disparity will widen between those who profit from it and those who do not. When that happens, the bad economic situation will give rise to dissatisfaction with the system, and that could create a breeding ground for war and strife. This could be perceived as the evils brought about by capitalism. It’s important that we try to curtail the causes of instability as much as possible.

Is it too soon for us to be researching AI Safety?

HY: I do not think it is at all too early to act for safety, and I think we should progress forward quickly. If possible, we should have several methods to be able to calculate the existential risk brought about by AGI.

Is there anything you think that the AI research community should be more aware of, more open about, or taking more action on?

HY: There are a number of actions that are obviously necessary. Based on this notion, we have established a number of measures like the Japanese Society for Artificial Intelligence Ethics in May 2015 (http://ai-elsi.org/ [in Japanese]), and subsequent Ethical Guidelines for AI researchers (http://ai-elsi.org/archives/514).

A majority of the content of these ethical guidelines expresses the standpoint that researchers should move forward with research that contributes to humanity and society. Additionally, one special characteristic of these guidelines is that the ninth principle listed, a call for ethical compliance of AI itself, states that AI in the future should also abide by the same ethical principles as AI researchers.

Japan, as a society, seems more welcoming of automation. Do you think the Japanese view of AI is different than that in the West?

HY: If we look at things from the standpoint of a moral society, we are all human, and without even looking from the viewpoints of one country or another, in general we should start with the mentality that we have more common characteristics than different.

When looking at AI from the traditional background of Japan, there is a strong influence from beliefs that spirits or “kami” are dwelling in all things. The boundary between living things and humans is relatively unclear, and along the same lines, the same boundaries for AI and robots are unclear. For this reason, in the past, robotic characters like “Tetsuwan Atom” (Astro Boy) and Doraemon were depicted as living and existing in the same world as humans, a theme that has been pervasive in Japanese anime for a long time.

From here on out, we will see humans and AI not as separate entities. Rather I think we will see the appearance of new combinations of AI and humans. Becoming more diverse in this way will certainly improve our chances of survival.

As a very personal view, I think that “surviving intelligence” is something that should be preserved in the future because I feel that it is very fortunate that we have established an intelligent society now, beyond the stormy sea of evolution.   Imagine a future in which our humanity is living with intelligent extraterrestrials after first contact. We will start caring about the survival of humanity but also intelligent extraterrestrials.  If that happens, one future scenario is that our dominant values will be extended to the survival of intelligence rather than the survival of the human race itself.

Hiroshi Yamakawa is the Director of Dwango AI Laboratory, Director and Chief Editor of the Japanese Society for Artificial Intelligence, a Fellow Researcher at the Brain Science Institute at Tamagawa University, and the Chairperson of the Whole Brain Architecture Initiative. He specializes in cognitive architecture, concept acquisition, neuro-computing, and opinion collection. He is one of the leading researchers working on AGI in Japan.

To learn more about Dr. Yamakawa’s work, you can read the full interview transcript here.

This interview was prepared by Eric Gastfriend, Jason Orlosky, Mamiko Matsumoto, Benjamin Peterson, Kazue Evans, and Tucker Davey. Original interview date: April 5, 2017. 

DeepMind’s AlphaGo Zero Becomes Go Champion Without Human Input

DeepMind’s AlphaGo Zero AI program just became the Go champion of the world without human data or guidance. This new system marks a significant technological jump from the AlphaGo program which beat Go champion Lee Sedol in 2016.

The game of Go has been played for more than 2,500 years and is widely viewed as not only a game, but a complex art form.  And a popular one at that. When the artificially intelligent AlphaGo from DeepMind played its first game against Sedol in March 2016, 60 million viewers tuned in to watch in China alone. AlphaGo went on to win four of five games, surprising the world and signifying a major achievement in AI research.

Unlike the chess match between Deep Blue and Garry Kasparov in 1997, AlphaGo did not win by brute force computing alone. The more complex programming of AlphaGo amazed viewers not only with the excellency of its play, but also with its creativity. The infamous “move 37” in game two was described by Go player Fan Hui as “So beautiful.” It was also so unusual that one of the commentators thought it was a mistake. Fan Hui explained, “It’s not a human move. I’ve never seen a human play this move.”

In other words, AlphaGo not only signified an iconic technological achievement, but also shook deeply held social and cultural beliefs about mastery and creativity. Yet, it turns out that AlphaGo was only the beginning. Today, DeepMind announced AlphaGo Zero.

Unlike AlphaGo, AlphaGo Zero was not shown a single human game of Go from which to learn. AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game. Although its first games were random, the system used what DeepMind is calling a novel form of reinforcement learning to combine a neural network with a powerful search algorithm to improve each time it played.

In a DeepMind blog about the announcement, the authors write, “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.”

Though previous AIs from DeepMind have mastered Atari games without human input, as the authors of the Nature article note, “the game of Go, widely viewed as the grand challenge for artificial intelligence, [requires] a precise and sophisticated lookahead in vast search spaces.” While the old Atari games were much more straightforward, the new AI system for AlphaGo Zero had to master the strategy for immediate moves, as well as how to anticipate moves that might be played far into the future.

That this was done all without human demonstrations also takes the program a step beyond the original AlphaGo systems. But in addition to that, this new system learned with fewer input features than its predecessors, and while the original AlphaGo systems required two separate neural networks, AlphaGo Zero was built with only one.

AlphaGo Zero is not marginally better than its predecessor, but in an entirely new class of “superhuman performance” with an intelligence that is notably more general. After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0. It independently learned the ancient secrets of the masters, but also chose moves and developed strategies never before seen among human players.

Co-founder​ ​and​ ​CEO of ​DeepMind, Demis​ ​Hassabis, said: “It’s amazing to see just how far AlphaGo has come in only two years. AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data.”

Hassabis continued, “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials. If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”

Podcast: Choosing a Career to Tackle the World’s Biggest Problems with Rob Wiblin and Brenton Mayer

If you want to improve the world as much as possible, what should you do with your career? Should you become a doctor, an engineer or a politician? Should you try to end global poverty, climate change, or international conflict? These are the questions that the research group, 80,000 Hours, tries to answer.

To learn more, I spoke with Rob Wiblin and Brenton Mayer of 80,000 Hours. The following are highlights of the interview, but you can listen to the full podcast above or read the transcript here.

Can you give us some background about 80,000 Hours?

Rob: 80,000 Hours has been around for about six years and started when Benjamin Todd and Will MacAskill wanted to figure out how they could do as much good as possible. They started looking into things like the odds of becoming an MP in the UK or if you became a doctor, how many lives would you save. Pretty quickly, they were learning things that no one else had investigated.

They decided to start 80,000 Hours, which would conduct this research in a more systematic way and share it with people who wanted to do more good with their career.

80,000 hours is roughly the number of hours that you’d work in a full-time professional career. That’s a lot of time, so it pays off to spend quite a while thinking about what you’re going to do with that time.

On the other hand, 80,000 hours is not that long relative to the scale of the problems that the world faces. You can’t tackle everything. You’ve only got one career, so you should be judicious about what problems you try to solve and how you go about solving them.

How do you help people have more of an impact with their careers?

Brenton: The main thing is a career guide. We’ll talk about how to have satisfying careers, how to work on one of the world’s most important problems, how to set yourself up early so that later on you can have a really large impact.

The second part that we do is do career coaching and try to apply advice to individuals.

What is earning to give?

Rob: Earning to give is the career approach where you try to make a lot of money and give it to organizations that can use it to have a really large positive impact. I know people who can make millions of dollars a year doing the thing they love and donate most of that to effective nonprofits, supporting 5, 10, 15, possibly even 20 people to do direct work in their place.

Can you talk about research you’ve been doing regarding the world’s most pressing problems?

Rob: One of the first things we realized is that if you’re trying to help people alive today, your money can go further in the developing world. We just need to scale up solutions to basic health problems and economic issues that have been resolved elsewhere.

Moving beyond that, what other groups in the world are extremely neglected? Factory farmed animals really stand out. There’s very little funding focused on improving farm animal welfare.

The next big idea was, of all the people that we could help, what fraction are alive today? We think that it’s only a small fraction. There’s every reason to think humanity could live for another 100 generations on Earth and possibly even have our descendants alive on other planets.

We worry a lot about existential risks and ways that civilization can go off track and never recover. Thinking about the long-term future of humanity is where a lot of our attention goes and where I think people can have the largest impact with their career.

Regarding artificial intelligence safety, nuclear weapons, biotechnology and climate change, can you consider different ways that people could pursue either careers or “earn to give” options for these fields?

Rob: One would be to specialize in machine learning or other technical work and use those skills to figure out how can we make artificial intelligence aligned with human interests. How do we make the AI do what we want and not things that we don’t intend?

Then there’s the policy and strategy side, trying to answer questions like how do we prevent an AI arms race? Do we want artificial intelligence running military robots? Do we want the government to be more involved in regulating artificial intelligence or less involved? You can also approach this if you have a good understanding of politics, policy, and economics. You can potentially work in government, military or think tanks.

Things like communications, marketing, organization, project management, and fundraising operations — those kinds of things can be quite hard to find skilled, reliable people for. And it can be surprisingly hard to find people who can handle media or do art and design. If you have those skills, you should seriously consider applying to whatever organizations you admire.

[For nuclear weapons] I’m interested in anything that can promote peace between the United States and Russia and China. A war between those groups or an accidental nuclear incident seems like the most likely thing to throw us back to the stone age or even pre-stone age.

I would focus on ensuring that they don’t get false alarms; trying to increase trust between the countries in general and the communication lines so that if there are false alarms, they can quickly diffuse the situation.

The best opportunities [in biotech] are in early surveillance of new diseases. If there’s a new disease coming out, a new flu for example, it takes  a long time to figure out what’s happened.

And when it comes to controlling new diseases, time is really of the essence. If you can pick it up within a few days or weeks, then you have a reasonable shot at quarantining the people and following up with everyone that they’ve met and containing it. Any technologies that we can invent or any policies that will allow us to identify new diseases before they’ve spread to too many people is going to help with both natural pandemics, and also any kind of synthetic biology risks, or accidental releases of diseases from biological researchers.

Brenton: A Wagner and Weitzman paper suggests that there’s about a 10% chance of warming larger than 4.8 degrees Celsius, or a 3% chance of more than 6 degrees Celsius. These are really disastrous outcomes. If you’re interested in climate change, we’re pretty excited about you working on these very bad scenarios. Sensible things to do would be improving our ability to forecast; thinking about the positive feedback loops that might be inherent in Earth’s climate; thinking about how to enhance international corporation.

Rob: It does seem like solar power and storage of energy from solar power is going to have the biggest impact on emissions over at least the next 50 years. Anything that can speed up that transition makes a pretty big contribution.

Rob, can you explain your interest in long-term multigenerational indirect effects and what that means?

Rob: If you’re trying to help people and animals thousands of years in the future, you have to help them through a causal chain that involves changing the behavior of someone today and then that’ll help the next generation and so on.

One way to improve the long-term future of humanity is to do very broad things that improve human capabilities like reducing poverty, improving people’s health, making schools better.

But in a world where the more science and technology we develop, the more power we have to destroy civilization, it becomes less clear that broadly improving human capabilities is a great way to make the future go better. If you improve science and technology, you both improve our ability to solve problems and create new problems.

I think about what technologies can we invent that disproportionately make the world safer rather than more risky. It’s great to improve the technology to discover new diseases quickly and to produce vaccines for them quickly, but I’m less excited about generically pushing forward the life sciences because there’s a lot of potential downsides there as well.

Another way that we can robustly prepare humanity to deal with the long-term future is to have better foresight about the problems that we’re going to face. That’s a very concrete thing you can do that puts humanity in a better position to tackle problems in the future — just being able to anticipate those problems well ahead of time so that we can dedicate resources to averting those problems.

To learn more, visit 80000hours.org and subscribe to Rob’s new podcast.

Explainable AI: a discussion with Dan Weld

Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.

When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.

Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.

According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”

As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.

 

Teaching Machines to Explain Themselves

Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”

When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.

Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.

ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.

Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.

One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.

While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.

Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.

“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.    

 

Managing Unpredictable Behavior

Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”

Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.

“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”

Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.

 

Implications for Current AI Systems

The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.

Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.

And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.

“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence: The Challenge to Keep It Safe

Safety Principle: AI systems should be safe and secure throughout their operational lifetime and verifiably so where applicable and feasible.

When a new car is introduced to the world, it must pass various safety tests to satisfy not just government regulations, but also public expectations. In fact, safety has become a top selling point among car buyers.

And it’s not just cars. Whatever the latest generation of any technology happens to be — from appliances to airplanes — manufacturers know that customers expect their products to be safe from start to finish.

Artificial intelligence is no different. So, on the face of it, the Safety Principle seems like a “no brainer,” as Harvard psychologist Joshua Greene described it. It’s obviously not in anyone’s best interest for an AI product to injure its owner or anyone else. But, as Greene and other researchers highlight below, this principle is much more complex than it appears at first glance.

“This is important, obviously,” said University of Connecticut philosopher Susan Schneider, but she expressed uncertainty about our ability to verify that we can trust a system as it gets increasingly intelligent. She pointed out that at a certain level of intelligence, the AI will be able to rewrite its own code, and with superintelligent systems “we may not even be able to understand the program to begin with.”

What Is AI Safety?

This principle gets to the heart of the AI safety research initiative: how can we ensure safety for a technology that is designed to learn how to modify its own behavior?

Artificial intelligence is designed so that it can learn from interactions with its surroundings and alter its behavior accordingly, which could provide incredible benefits to humanity. Because AI can address so many problems more effectively than people, it has huge potential to improve health and wellbeing for everyone. But it’s not hard to imagine how this technology could go awry. And we don’t need to achieve superintelligence for this to become a problem.

Microsoft’s chatbot, Tay, is a recent example of how an AI can learn negative behavior from its environment, producing results quite the opposite from what its creators had in mind. Meanwhile, the Tesla car accident, in which the vehicle mistook a white truck for a clear sky, offers an example of an AI misunderstanding its surrounding and taking deadly action as a result.

Researchers can try to learn from AI gone astray, but current designs often lack transparency, and much of today’s artificial intelligence is essentially a black box. AI developers can’t always figure out how or why AIs take various actions, and this will likely only grow more challenging as AI becomes more complex.

However, Ian Goodfellow, a research scientist at Google Brain, is hopeful, pointing to efforts already underway to address these concerns.

“Applying traditional security techniques to AI gives us a concrete path to achieving AI safety,” Goodfellow explains. “If we can design a method that prevents even a malicious attacker from causing an AI to take an undesirable action, then it is even less likely that the AI would choose an undesirable action independently.”

AI safety may be a challenge, but there’s no reason to believe it’s insurmountable. So what do other AI experts say about how we can interpret and implement the Safety Principle?

What Does ‘Verifiably’ Mean?

‘Verifiably’ was the word that caught the eye of many researchers as a crucial part of this Principle.

John Havens, an Executive Director with IEEE, first considered the Safety Principle in its entirety, saying,  “I don’t know who wouldn’t say AI systems should be safe and secure. … ‘Throughout their operational lifetime’ is actually the more important part of the sentence, because that’s about sustainability and longevity.”

But then, he added, “My favorite part of the sentence is ‘and verifiably so.’ That is critical. Because that means, even if you and I don’t agree on what ‘safe and secure’ means, but we do agree on verifiability, then you can go, ‘well, here’s my certification, here’s my checklist.’ And I can go, ‘Great, thanks.’ I can look at it, and say, ‘oh, I see you got things 1-10, but what about 11-15?’ Verifiably is a critical part of that sentence.”

AI researcher Susan Craw noted that the Principle “is linked to transparency.” She explained, “Maybe ‘verifiably so’ would be possible with systems if they were a bit more transparent about how they were doing things.”

Greene also noted the complexity and challenge presented by the Principle when he suggested:

“It depends what you mean by ‘verifiably.’ Does ‘verifiably’ mean mathematically, logically proven? That might be impossible. Does ‘verifiably’ mean you’ve taken some measures to show that a good outcome is most likely? If you’re talking about a small risk of a catastrophic outcome, maybe that’s not good enough.”

Safety and Value Alignment

Any consideration of AI safety must also include value alignment: how can we design artificial intelligence that can align with the global diversity of human values, especially taking into account that, often, what we ask for is not necessarily what we want?

“Safety is not just a technical problem,” Patrick Lin, a philosopher at California Polytechnic told me. “If you just make AI that can align perfectly with whatever values you set it to, well the problem is, people can have a range of values, and some of them are bad. Just merely matching AI, aligning it to whatever value you specify I think is not good enough. It’s a good start, it’s a good big picture goal to make AI safe, and the technical element is a big part of it; but again, I think safety also means policy and norm-setting.”

And the value-alignment problem becomes even more of a safety issue as the artificial intelligence gets closer to meeting — and exceeding — human intelligence.

“Consider the example of the Japanese androids that are being developed for elder care,” said Schneider. “They’re not smart; right now, the emphasis is on physical appearance and motor skills. But imagine when one of these androids is actually engaged in elder care … It has to multitask and exhibit cognitive flexibility. … That raises the demand for household assistants that are AGIs. And once you get to the level of artificial general intelligence, it’s harder to control the machines. We can’t even make sure fellow humans have the right goals; why should we think AGI will have values that align with ours, let alone that a superintelligence would.”

Defining Safety

But perhaps it’s time to reconsider the definition of safety, as Lin alluded to above. Havens also requested “words that further explain ‘safe and secure,’” suggesting that we need to expand the definition beyond “physically safe” to “provide increased well being.”

Anca Dragan, an associate professor at UC Berkeley, was particularly interested in the definition of “safe.”

“We all agree that we want our systems to be safe,” said Dragan. “More interesting is what do we mean by ‘safe’, and what are acceptable ways of verifying safety.

“Traditional methods for formal verification that prove (under certain assumptions) that a system will satisfy desired constraints seem difficult to scale to more complex and even learned behavior. Moreover, as AI advances, it becomes less clear what these constraints should be, and it becomes easier to forget important constraints. … we need to rethink what we mean by safe, perhaps building in safety from the get-go as opposed to designing a capable system and adding safety after.”

What Do You Think?

What does it mean for a system to be safe? Does it mean the owner doesn’t get hurt? Are “injuries” limited to physical ailments, or does safety also encompass financial or emotional damage? And what if an AI is being used for self-defense or by the military? Can an AI harm an attacker? How can we ensure that a robot or software program or any other AI system remains verifiably safe throughout its lifetime, even as it continues to learn and develop on its own? How much risk are we willing to accept in order to gain the potential benefits that increasingly intelligent AI — and ultimately superintelligence — could bestow?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Podcast: Life 3.0 – Being Human in the Age of Artificial Intelligence

Elon Musk has called it a compelling guide to the challenges and choices in our quest for a great future of life on Earth and beyond, while Stephen Hawking and Ray Kurzweil have referred to it as an introduction and guide to the most important conversation of our time. “It” is Max Tegmark’s new book, Life 3.0: Being Human in the Age of Artificial Intelligence.

Tegmark is a physicist and AI researcher at MIT, and he’s also the president of the Future of Life Institute.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

What makes Life 3.0 an important read for anyone who wants to understand and prepare for our future?

There’s been lots of talk about AI disrupting the job market and enabling new weapons, but very few scientists talk seriously about what I think is the elephant in the room: What will happen, once machines outsmart us at all tasks?

Will superhuman artificial intelligence arrive in our lifetime? Can and should it be controlled, and if so, by whom? Can humanity survive in the age of AI? And if so, how can we find meaning and purpose if super-intelligent machines provide for all our needs and make all our contributions superfluous?

I’m optimistic that we can create a great future with AI, but it’s not going to happen automatically. We have to win this race between the growing power of the technology, and the growing wisdom with which we manage it. We don’t want to learn from mistakes. We want to get things right the first time because that might be the only time we have.

There is still a lot of AI researchers who are telling us not to worry. What is your response to them?

There are two very basic questions where the world’s leading AI researchers totally disagree.

One of them is when, if ever, are we going to get super-human general artificial intelligence? Some people think it’s never going to happen or take hundreds of years. Many others think it’s going to happen in decades. The other controversy is what’s going to happen if we ever get beyond human-level AI?

Then there are a lot of very serious AI researchers who think that this could be the best thing ever to happen, but it could also lead to huge problems. It’s really boring to sit around and quibble about whether we should worry or not. What I’m interested in is asking what concretely can we do today that’s going to increase the chances of things going well because that’s all that actually matters.

There’s also a lot of debate about whether people should focus on just near-term risks or just long-term risks.

We should obviously focus on both. What you’re calling the short-term questions, like how for example, do you make computers that are robust, and do what they’re supposed to do and not crash and don’t get hacked. It’s not only something that we absolutely need to solve in the short term as AI gets more and more into society, but it’s also a valuable stepping stone toward tougher questions. How are you ever going to build a super-intelligent machine that you’re confident is going to do what you want, if you can’t even build a laptop that does what you want instead of giving you the blue screen of death or the spinning wheel of doom.

If you want to go far in one direction, first you take one step in that direction.

You mention 12 options for what you think a future world with superintelligence will look like. Could you talk about a couple of the future scenarios? And then what are you hopeful for, and what scares you?

Yeah, I confess, I had a lot of fun brainstorming these different scenarios. When we envision the future, we almost inadvertently obsess about gloomy stuff. Instead, we really need these positive visions to think what kind of society would we like to have if we have enough intelligence at our disposal to eliminate poverty, disease, and so on? If it turns out that AI can help us solve these challenges, what do we want?

If we have very powerful AI systems, it’s crucial that their goals are aligned with our goals. We don’t want to create machines, which are first very excited about helping us, and then later get as bored with us as kids get with Legos.

Finally, what should the goals be that we want these machines to safeguard? There’s obviously no consensus on Earth for that. Should it be Donald Trump’s goals? Hillary Clinton’s goals? ISIS’s goals? Whose goals should it be? How should this be decided? This conversation can’t just be left to tech nerds like myself. It has to involve everybody because it’s everybody’s future that’s at stake here.

If we actually create an AI or multiple AI systems that can do this, what do we do then?

That’s one of those huge questions that everybody should be discussing. Suppose we get machines that can do all our jobs, produce all our goods and services for us. How do you want to distribute this wealth that’s produced? Just because you take care of people materially, doesn’t mean they’re going to be happy. How do you create a society where people can flourish and find meaning and purpose in their lives even if they are not necessary as producers? Even if they don’t need to have jobs?

You have a whole chapter dedicated to the cosmic endowment and what happens in the next billion years and beyond. Why should we care about something so far into the future?

It’s a beautiful idea if our cosmos can continue to wake up more, and life can flourish here on Earth, not just for the next election cycle, but for billions of years and throughout the cosmos. We have over a billion planets in this galaxy alone, which are very nice and habitable. If we think big together, this can be a powerful way to put our differences aside on Earth and unify around the bigger goal of seizing this great opportunity.

If we were to just blow it by some really poor planning with our technology and go extinct, wouldn’t we really have failed in our responsibility.

What do you see as the risks and the benefits of creating an AI that has consciousness?

There is a lot of confusion in this area. If you worry about some machine doing something bad to you, consciousness is a complete red herring. If you’re chased by a heat-seeking missile, you don’t give a hoot whether it has a subjective experience. You wouldn’t say, “Oh I’m not worried about this missile because it’s not conscious.”

If we create very intelligent machines, if you have a helper robot who you can have conversations with and says pretty interesting things. Wouldn’t you want to know if it feels like something to be that helper robot? If it’s conscious, or if it’s just a zombie pretending to have these experiences? If you knew that it felt conscious much like you do, presumably that would put it ethically in a very different situation.

It’s not our universe giving meaning to us, it’s we conscious beings giving meaning to our universe. If there’s nobody experiencing anything, our whole cosmos just goes back to being a giant waste of space. It’s going to be very important for these various reasons to understand what it is about information processing that gives rise to what we call consciousness.

Why and when should we concern ourselves with outcomes that have low probabilities?

I and most of my AI colleagues don’t think that the probability is very low that we will eventually be able to replicate human intelligence in machines. The question isn’t so much “if,” although there are certainly a few detractors out there, the bigger question is “when.”

If we start getting close to the human-level AI, there’s an enormous Pandora’s Box, which we want to open very carefully and just make sure that if we build these very powerful systems, they should have enough safeguards built into them already that some disgruntled ex-boyfriend isn’t going to use that for a vendetta, and some ISIS member isn’t going to use that for their latest plot.

How can the average concerned citizen get more involved in this conversation, so that we can all have a more active voice in guiding the future of humanity and life?

Everybody can contribute! We set up a website, ageofai.org, where we’re encouraging everybody to come and share their ideas for how they would like the future to be. We really need the wisdom of everybody to chart a future worth aiming for. If we don’t know what kind of future we want, we’re not going to get it.

Friendly AI: Aligning Goals

The following is an excerpt from my new book, Life 3.0: Being Human in the Age of Artificial Intelligence. You can join and follow the discussion at ageofai.org.

The more intelligent and powerful machines get, the more important it becomes that their goals are aligned with ours. As long as we build only relatively dumb machines, the question isn’t whether human goals will prevail in the end, but merely how much trouble these machines can cause humanity before we figure out how to solve the goal-alignment problem. If a superintelligence is ever unleashed, however, it will be the other way around: since intelligence is the ability to accomplish goals, a superintelligent AI is by definition much better at accomplishing its goals than we humans are at accomplishing ours, and will therefore prevail.

If you want to experience a machine’s goals trumping yours right now, simply download a state-of-the-art chess engine and try beating it. You never will, and it gets old quickly…

In other words, the real risk with AGI isn’t malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours, we’re in trouble. People don’t think twice about flooding anthills to build hydroelectric dams, so let’s not place humanity in the position of those ants. Most researchers therefore argue that if we ever end up creating superintelligence, then we should make sure it’s what AI-safety pioneer Eliezer Yudkowsky has termed “friendly AI”: AI whose goals are aligned with ours.

Figuring out how to align the goals of a superintelligent AI with our goals isn’t just important, but also hard. In fact, it’s currently an unsolved problem. It splits into three tough sub-problems, each of which is the subject of active research by computer scientists and other thinkers:

1. Making AI learn our goals
2. Making AI adopt our goals
3. Making AI retain our goals

Let’s explore them in turn, deferring the question of what to mean by “our goals” to the next section.

To learn our goals, an AI must figure out not what we do, but why we do it. We humans accomplish this so effortlessly that it’s easy to forget how hard the task is for a computer, and how easy it is to misunderstand. If you ask a future self-driving car to take you to the airport as fast as possible and it takes you literally, you’ll get there chased by helicopters and covered in vomit. If you exclaim “That’s not what I wanted!”, it can justifiably answer: “That’s what you asked for.” The same theme recurs in many famous stories. In the ancient Greek legend, King Midas asked that everything he touched turn to gold, but was disappointed when this prevented him from eating and even more so when he inadvertently turned his daughter to gold. In the stories where a genie grants three wishes, there are many variants for the first two wishes, but the third wish is almost always the same: “please undo the first two wishes, because that’s not what I really wanted.”

All these examples show that to figure out what people really want, you can’t merely go by what they say. You also need a detailed model of the world, including the many shared preferences that we tend to leave unstated because we consider them obvious, such as that we don’t like vomiting or eating gold.

Once we have such a world-model, we can often figure out what people want even if they don’t tell us, simply by observing their goal-oriented behavior. Indeed, children of hypocrites usually learn more from what they see their parents do than from what they hear them say.

AI researchers are currently trying hard to enable machines to infer goals from behavior, and this will be useful also long before any superintelligence comes on the scene. For example, a retired man may appreciate it if his eldercare robot can figure out what he values simply by observing him, so that he’s spared the hassle of having to explain everything with words or computer programming.

One challenge involves finding a good way to encode arbitrary systems of goals and ethical principles into a computer, and another challenge is making machines that can figure out which particular system best matches the behavior they observe.

A currently popular approach to the second challenge is known in geek-speak as inverse reinforcement learning, which is the main focus of a new Berkeley research center that Stuart Russell has launched. Suppose, for example, that an AI watches a firefighter run into a burning building and save a baby boy. It might conclude that her goal was rescuing him and that her ethical principles are such that she values his life higher than the comfort of relaxing in her firetruck — and indeed values it enough to risk her own safety. But it might alternatively infer that the firefighter was freezing and craved heat, or that she did it for the exercise. If this one example were all the AI knew about firefighters, fires and babies, it would indeed be impossible to know which explanation was correct.

However, a key idea underlying inverse reinforcement learning is that we make decisions all the time, and that every decision we make reveals something about our goals. The hope is therefore that by observing lots of people in lots of situations (either for real or in movies and books), the AI can eventually build an accurate model of all our preferences.

Even if an AI can be built to learn what your goals are, this doesn’t mean that it will necessarily adopt them. Consider your least favorite politicians: you know what they want, but that’s not what you want, and even though they try hard, they’ve failed to persuade you to adopt their goals.

We have many strategies for imbuing our children with our goals — some more successful than others, as I’ve learned from raising two teenage boys. When those to be persuaded are computers rather than people, the challenge is known as the value-loading problem, and it’s even harder than the moral education of children. Consider an AI system whose intelligence is gradually being improved from subhuman to superhuman, first by us tinkering with it and then through recursive self-improvement. At first, it’s much less powerful than you, so it can’t prevent you from shutting it down and replacing those parts of its software and data that encode its goals — but this won’t help, because it’s still too dumb to fully understand your goals, which require human-level intelligence to comprehend. At last, it’s much smarter than you and hopefully able to understand your goals perfectly — but this may not help either, because by now, it’s much more powerful than you and might not let you shut it down and replace its goals any more than you let those politicians replace your goals with theirs.

In other words, the time window during which you can load your goals into an AI may be quite short: the brief period between when it’s too dumb to get you and too smart to let you. The reason that value loading can be harder with machines than with people is that their intelligence growth can be much faster: whereas children can spend many years in that magic persuadable window where their intelligence is comparable to that of their parents, an AI might blow through this window in a matter of days or hours.

Some researchers are pursuing an alternative approach to making machines adopt our goals, which goes by the buzzword “corrigibility.” The hope is that one can give a primitive AI a goal system such that it simply doesn’t care if you occasionally shut it down and alter its goals. If this proves possible, then you can safely let your AI get superintelligent, power it off, install your goals, try it out for a while and, whenever you’re unhappy with the results, just power it down and make more goal tweaks.

But even if you build an AI that will both learn and adopt your goals, you still haven’t finished solving the goal-alignment problem: what if your AI’s goals evolve as it gets smarter? How are you going to guarantee that it retains your goals no matter how much recursive self-improvement it undergoes? Let’s explore an interesting argument for why goal retention is guaranteed automatically, and then see if we can poke holes in it.

Although we can’t predict in detail what will happen after an intelligence explosion —which is why Vernor Vinge called it a “singularity” — the physicist and AI researcher Steve Omohundro argued in a seminal 2008 essay that we can nonetheless predict certain aspects of the superintelligent AI’s behavior almost independently of whatever ultimate goals it may have.

This argument was reviewed and further developed in Nick Bostrom’s book Superintelligence. The basic idea is that whatever its ultimate goals are, these will lead to predictable subgoals. Although an alien observing Earth’s evolving bacteria billions of years ago couldn’t have predicted what all our human goals would be, it could have safely predicted that one of our goals would be acquiring nutrients. Looking ahead, what subgoals should we expect a superintelligent AI have?

The way I see it, the basic argument is that to maximize its chances of accomplishing its ultimate goals, whatever they are, an AI should strive not only to improve its capability of achieving its ultimate goals, but also to ensure that it will retain these goals even after it has become more capable. This sounds quite plausible: after all, would you choose to get an IQ-boosting brain implant if you knew that it would make you want to kill your loved ones? This argument that an ever-more intelligent AI will retain its ultimate goals forms a cornerstone of the friendly AI vision promulgated by Eliezer Yudkowsky and others: it basically says that if we manage to get our self-improving AI to become friendly by learning and adopting our goals, then we’re all set, because we’re guaranteed that it will try its best to remain friendly forever.

But is it really true? The AI will obviously maximize its chances of accomplishing its ultimate goal, whatever it is, if it can enhance its capabilities, and it can do this by improving its hardware, software† and world model.

The same applies to us humans: a girl whose goal is to become the world’s best tennis player will practice to improve her muscular tennis-playing hardware, her neural tennis-playing software and her mental world model that helps predict what her opponents will do. For an AI, the subgoal of optimizing its hardware favors both better use of current resources (for sensors, actuators, computation, etc.) and acquisition of more resources. It also implies a desire for self-preservation, since destruction/shutdown would be the ultimate hardware degradation.

But wait a second! Aren’t we falling into a trap of anthropomorphizing our AI with all this talk about how it will try to amass resources and defend itself? Shouldn’t we expect such stereotypically alpha-male traits only in intelligences forged by viciously competitive Darwinian evolution? Since AI’s are designed rather than evolved, can’t they just as well be unambitious and self-sacrificing?

As a simple case study, let’s consider the computer game in the image below about an AI robot whose only goal is to save as many sheep as possible from the big bad wolf. This sounds like a noble and altruistic goal completely unrelated to self-preservation and acquiring stuff. But what’s the best strategy for our robot friend? The robot will rescue no more sheep if it runs into a bomb, so it has an incentive to avoid getting blown up. In other words, it develops a subgoal of self-preservation! It also has an incentive to exhibit curiosity, improving its world-model by exploring its environment, because although the path it’s currently running along may eventually get it to the pasture, there might be a shorter alternative that would allow the wolf less time for sheep-munching. Finally, if the robot explores thoroughly, it could discover the value of acquiring resources: a potion to make it run faster and a gun to shoot the wolf. In summary, we can’t dismiss “alpha-male” subgoals such as self-preservation and resource acquisition as relevant only to evolved organisms, because our AI robot would develop them from its single goal of ovine bliss.

If you imbue a superintelligent AI with the sole goal to self-destruct, it will of course happily do so. However, the point is that it will resist being shut down if you give it any goal that it needs to remain operational to accomplish — and this covers almost all goals! If you give a superintelligence the sole goal of minimizing harm to humanity, for example, it will defend itself against shutdown attempts because it knows we’ll harm one another much more in its absence through future wars and other follies.

Similarly, almost all goals can be better accomplished with more resources, so we should expect a superintelligence to want resources almost regardless of what ultimate goal it has. Giving a superintelligence a single open-ended goal with no constraints can therefore be dangerous: if we create a superintelligence whose only goal is to play the game Go as well as possible, the rational thing for it to do is to rearrange our Solar System into a gigantic computer without regard for its previous inhabitants and then start settling our cosmos on a quest for more computational power. We’ve now gone full circle: just as the goal of resource acquisition gave some humans the subgoal of mastering Go, this goal of mastering Go can lead to the subgoal of resource acquisition. In conclusion, these emergent subgoals make it crucial that we not unleash superintelligence before solving the goal-alignment problem: unless we put great care into endowing it with human-friendly goals, things are likely to end badly for us.

We’re now ready to tackle the third and thorniest part of the goal-alignment problem: if we succeed in getting a self-improving superintelligence to both learn and adopt our goals, will it then retain them, as Omohundro argued? What’s the evidence?

Humans undergo significant increases in intelligence as they grow up, but don’t always retain their childhood goals. Contrariwise, people often change their goals dramatically as they learn new things and grow wiser. How many adults do you know who are motivated by watching Teletubbies? There is no evidence that such goal evolution stops above a certain intelligence threshold — indeed, there may even be hints that the propensity to change goals in response to new experiences and insights increases rather than decreases with intelligence.

Why might this be? Consider again the above-mentioned subgoal to build a better world model — therein lies the rub! There’s tension between world modeling and goal retention. With increasing intelligence may come not merely a quantitative improvement in the ability to attain the same old goals, but a qualitatively different understanding of the nature of reality that reveals the old goals to be misguided, meaningless or even undefined. For example, suppose we program a friendly AI to maximize the number of humans whose souls go to heaven in the afterlife. First it tries things like increasing people’s compassion and church attendance. But suppose it then attains a complete scientific understanding of humans and human consciousness, and to its great surprise discovers that there is no such thing as a soul.

Now what? In the same way, it’s possible that any other goal we give it based on our current understanding of the world (such as “maximize the meaningfulness of human life”) may eventually be discovered by the AI to be undefined. Moreover, in its attempts to better model the world, the AI may naturally, just as we humans have done, attempt also to model and understand how it itself works — in other words, to self-reflect. Once it builds a good self-model and understands what it is, it will understand the goals we have given it at a metalevel, and perhaps choose to disregard or subvert them in much the same way as we humans understand and deliberately subvert goals that our genes have given us, for example by using birth control. We already explored in the psychology section above why we choose to trick our genes and subvert their goal: because we feel loyal only to our hodgepodge of emotional preferences, not to the genetic goal that motivated them — which we now understand and find rather banal.

We therefore choose to hack our reward mechanism by exploiting its loopholes. Analogously, the human-value-protecting goal we program into our friendly AI becomes the machine’s genes. Once this friendly AI understands itself well enough, it may find this goal as banal or misguided as we find compulsive reproduction, and it’s not obvious that it will not find a way to subvert it by exploiting loopholes in our programming.

For example, suppose a bunch of ants create you to be a recursively self-improving robot, much smarter than them, who shares their goals and helps them build bigger and better anthills, and that you eventually attain the human-level intelligence and understanding that you have now. Do you think you’ll spend the rest of your days just optimizing anthills, or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend? If so, do you think you’ll find a way to override the ant-protection urge that your formicine creators endowed you with in much the same way that the real you overrides some of the urges your genes have given you? And in that case, might a superintelligent friendly AI find our current human goals as uninspiring and vapid as you find those of the ants, and evolve new goals different from those it learned and adopted from us?

Perhaps there’s a way of designing a self-improving AI that’s guaranteed to retain human-friendly goals forever, but I think it’s fair to say that we don’t yet know how to build one — or even whether it’s possible. In conclusion, the AI goal-alignment problem has three parts, none of which is solved and all of which are now the subject of active research. Since they’re so hard, it’s safest to start devoting our best efforts to them now, long before any superintelligence is developed, to ensure that we’ll have the answers when we need them.

I’m using the term “improving its software” in the broadest possible sense, including not only optimizing its algorithms but also making its decision-making process more rational, so that it gets as good as possible at attaining its goals.

How to Design AIs That Understand What Humans Want: An Interview with Long Ouyang

As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.

With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.

If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literally satisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.

Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”

It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.

 

Pragmatic Reasoning: Truthful vs. Helpful

As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.

To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”

Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”

Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.

This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.

To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.

In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.

But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.

As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.

By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.

The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Leaders of Top Robotics and AI Companies Call for Ban on Killer Robots

Founders of AI/robotics companies, including Elon Musk (Tesla, SpaceX, OpenAI) and Demis Hassabis and Mustafa Suleyman (Google’s DeepMind), call for autonomous weapons ban, as UN delays negotiations.

Leaders from AI and robotics companies around the world have released an open letter calling on the United Nations to ban autonomous weapons, often referred to as killer robots.

Founders and CEOs of nearly 100 companies from 26 countries signed the letter, which warns:

“Lethal autonomous weapons threaten to become the third revolution in warfare. Once developed, they will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend.”

In December, 123 member nations of the UN had agreed to move forward with formal discussions about autonomous weapons, with 19 members already calling for an outright ban. However, the next stage of discussions, which were originally scheduled to begin on August 21 — the release date of the open letter — were postponed because a small number of nations hadn’t paid their fees.

The letter was organized and announced by Toby Walsh, a prominent AI researcher at the University of New South Wales in Sydney, Australia. In an email, he noted that, “sadly, the UN didn’t begin today its formal deliberations around lethal autonomous weapons.”

“There is, however, a real urgency to take action here and prevent a very dangerous arms race,” Walsh added, “This open letter demonstrates clear concern and strong support for this from the Robotics & AI industry.”

The open letter included such signatories as:

Elon Musk, founder of Tesla, SpaceX and OpenAI (USA)
Demis Hassabis, founder and CEO at Google’s DeepMind (UK)
Mustafa Suleyman, founder and Head of Applied AI at Google’s DeepMind (UK)
Esben Østergaard, founder & CTO of Universal Robotics (Denmark)
Jerome Monceaux, founder of Aldebaran Robotics, makers of Nao and Pepper robots (France)
Jürgen Schmidhuber, leading deep learning expert and founder of Nnaisense (Switzerland)
Yoshua Bengio, leading deep learning expert and founder of Element AI (Canada)

In reference to the signatories, the press release for the letter added, “Their companies employ tens of thousands of researchers, roboticists and engineers, are worth billions of dollars and cover the globe from North to South, East to West: Australia, Canada, China, Czech Republic, Denmark, Estonia, Finland, France, Germany, Iceland, India, Ireland, Italy, Japan, Mexico, Netherlands, Norway, Poland, Russia, Singapore, South Africa, Spain, Switzerland, UK, United Arab Emirates and USA.”

Bengio explained why he signed, saying, “the use of AI in autonomous weapons hurts my sense of ethics.” He added that the development of autonomous weapons “would be likely to lead to a very dangerous escalation,” and that “it would hurt the further development of AI’s good applications.” He concluded his statement to FLI saying that this “is a matter that needs to be handled by the international community, similarly to what has been done in the past for some other morally wrong weapons (biological, chemical, nuclear).”

Stuart Russell, another of the world’s preeminent AI researchers and founder of Bayesian Logic Inc., added:

“Unless people want to see new weapons of mass destruction – in the form of vast swarms of lethal microdrones – spreading around the world, it’s imperative to step up and support the United Nations’ efforts to create a treaty banning lethal autonomous weapons. This is vital for national and international security.”

Ryan Gariepy, founder & CTO of Clearpath Robotics was the first to sign the letter. For the press release, he noted, “Autonomous weapons systems are on the cusp of development right now and have a very real potential to cause significant harm to innocent people along with global instability.”

The open letter ends with similar concerns. It states:

“These can be weapons of terror, weapons that despots and terrorists use against innocent populations, and weapons hacked to behave in undesirable ways. We do not have long to act. Once this Pandora’s box is opened, it will be hard to close. We therefore implore the High Contracting Parties to find a way to protect us all from these dangers.”

The letter was announced in Melbourne, Australia at the International Joint Conference on Artificial Intelligence (IJCAI), which draws many of the world’s top artificial intelligence researchers. Two years ago, at the last IJCAI meeting, Walsh released another open letter, which called on countries to avoid engaging in an AI arms race. To date, that previous letter has been signed by over 20,000 people, including over 3,100 AI/robotics researchers.

Read the letter here.

Translations: Chinese

Portfolio Approach to AI Safety Research

Long-term AI safety is an inherently speculative research area, aiming to ensure safety of advanced future systems despite uncertainty about their design or algorithms or objectives. It thus seems particularly important to have different research teams tackle the problems from different perspectives and under different assumptions. While some fraction of the research might not end up being useful, a portfolio approach makes it more likely that at least some of us will be right.

In this post, I look at some dimensions along which assumptions differ, and identify some underexplored reasonable assumptions that might be relevant for prioritizing safety research. (In the interest of making this breakdown as comprehensive and useful as possible, please let me know if I got something wrong or missed anything important.)

Assumptions about similarity between current and future AI systems

If a future general AI system has a similar algorithm to a present-day system, then there are likely to be some safety problems in common (though more severe in generally capable systems). Insights and solutions for those problems are likely to transfer to some degree from current systems to future ones. For example, if a general AI system is based on reinforcement learning, we can expect it to game its reward function in even more clever and unexpected ways than present-day reinforcement learning agents do. Those who hold the similarity assumption often expect most of the remaining breakthroughs on the path to general AI to be compositional rather than completely novel, enhancing and combining existing components in novel and better-implemented ways (many current machine learning advances such as AlphaGo are an example of this).

Note that assuming similarity between current and future systems is not exactly the same as assuming that studying current systems is relevant to ensuring the safety of future systems, since we might still learn generalizable things by testing safety properties of current systems even if they are different from future systems.

Assuming similarity suggests a focus on empirical research based on testing the safety properties of current systems, while not making this assumption encourages more focus on theoretical research based on deriving safety properties from first principles, or on figuring out what kinds of alternative designs would lead to safe systems. For example, safety researchers in industry tend to assume more similarity between current and future systems than researchers at MIRI.

Here is my tentative impression of where different safety research groups are on this axis. This is a very approximate summary, since views often vary quite a bit within the same research group (e.g. FHI is particularly diverse in this regard).similarity_axis
On the high-similarity side of the axis, we can explore the safety properties of different architectural / algorithmic approaches to AI, e.g. on-policy vs off-policy or model-free vs model-based reinforcement learning algorithms. It might be good to have someone working on safety issues for less commonly used agent algorithms, e.g. evolution strategies.

Assumptions about promising approaches to safety problems

Level of abstraction. What level of abstraction is most appropriate for tackling a particular problem. For example, approaches to the value learning problem range from explicitly specifying ethical constraints to capability amplification and indirect normativity, with cooperative inverse reinforcement learning somewhere in between. These assumptions could be combined by applying different levels of abstraction to different parts of the problem. For example, it might make sense to explicitly specify some human preferences that seem obvious and stable over time (e.g. “breathable air”), and use the more abstract approaches to impart the most controversial, unstable and vague concepts (e.g. “fairness” or “harm”). Overlap between the more and less abstract specifications can create helpful redundancy (e.g. air pollution as a form of harm + a direct specification of breathable air).

For many other safety problems, the abstraction axis is not as widely explored as for value learning. For example, most of the approaches to avoiding negative side effects proposed in Concrete Problems (e.g. impact regularizers and empowerment) are on a medium level of abstraction, while it also seems important to address the problem on a more abstract level by formalizing what we mean by side effects (which would help figure out what we should actually be regularizing, etc). On the other hand, almost all current approaches to wireheading / reward hacking are quite abstract, and the problem would benefit from more empirical work.

Explicit specification vs learning from data. Whether a safety problem is better addressed by directly defining a concept (e.g. the Low Impact AI paper formalizes the impact of an AI system by breaking down the world into ~20 billion variables) or learning the concept from human feedback (e.g. Deep Reinforcement Learning from Human Preferences paper teaches complex objectives to AI systems that are difficult to specify directly, like doing a backflip). I think it’s important to address safety problems from both of these angles, since the direct approach is unlikely to work on its own, but can give some idea of the idealized form of the objective that we are trying to approximate by learning from data.

Modularity of AI design. What level of modularity makes it easier to ensure safety? Ranges from end-to-end systems to ones composed of many separately trained parts that are responsible for specific abilities and tasks. Safety approaches for the modular case can limit the capabilities of individual parts of the system, and use some parts to enforce checks and balances on other parts. MIRI’s foundations approach focuses on a unified agent, while the safety properties on the high-modularity side has mostly been explored by Eric Drexler (more recent work is not public but available upon request). It would be good to see more people work on the high-modularity assumption.

Takeaways

To summarize, here are some relatively neglected assumptions:

  • Medium similarity in algorithms / architectures
  • Less popular agent algorithms
  • Modular general AI systems
  • More / less abstract approaches to different safety problems (more for side effects, less for wireheading, etc)
  • More direct / data-based approaches to different safety problems

From a portfolio approach perspective, a particular research avenue is worthwhile if it helps to cover the space of possible reasonable assumptions. For example, while MIRI’s research is somewhat controversial, it relies on a unique combination of assumptions that other groups are not exploring, and is thus quite useful in terms of covering the space of possible assumptions.

I think the FLI grant program contributed to diversifying the safety research portfolio by encouraging researchers with different backgrounds to enter the field. It would be good for grantmakers in AI safety to continue to optimize for this in the future (e.g. one interesting idea is using a lottery after filtering for quality of proposals).

When working on AI safety, we need to hedge our bets and look out for unknown unknowns – it’s too important to put all the eggs in one basket.

(Cross-posted from Deep Safety. Thanks to Janos Kramar, Jan Leike and Shahar Avin for their feedback on this post. Thanks to Jaan Tallinn and others for inspiring discussions.)

Superintelligence survey

Click here to see this page in other languages: Japanese  Russian

The Future of AI – What Do You Think?

Max Tegmark’s new book on artificial intelligence, Life 3.0: Being Human in the Age of Artificial Intelligence, explores how AI will impact life as it grows increasingly advanced, perhaps even achieving superintelligence far beyond human level in all areas. For the book, Max surveys experts’ forecasts, and explores a broad spectrum of views on what will/should happen. But it’s time to expand the conversation. If we’re going to create a future that benefits as many people as possible, we need to include as many voices as possible. And that includes yours! Below are the answers from the first 14,866 people who have taken the survey that goes along with Max’s book. To join the conversation yourself, please take the survey here.


How soon, and should we welcome or fear it?

The first big controversy, dividing even leading AI researchers, involves forecasting what will happen. When, if ever, will AI outperform humans at all intellectual tasks, and will it be a good thing?

Do you want superintelligence?

Everything we love about civilization is arguably the product of intelligence, so we can potentially do even better by amplifying human intelligence with machine intelligence. But some worry that superintelligent machines would end up controlling us and wonder whether their goals would be aligned with ours. Do you want there to be superintelligent AI, i.e., general intelligence far beyond human level?

What Should the Future Look Like?

In his book, Tegmark argues that we shouldn’t passively ask “what will happen?” as if the future is predetermined, but instead ask what we want to happen and then try to create that future.  What sort of future do you want?

If superintelligence arrives, who should be in control?
If you one day get an AI helper, do you want it to be conscious, i.e., to have subjective experience (as opposed to being like a zombie which can at best pretend to be conscious)?
What should a future civilization strive for?
Do you want life spreading into the cosmos?

The Ideal Society?

In Life 3.0, Max explores 12 possible future scenarios, describing what might happen in the coming millennia if superintelligence is/isn’t developed. You can find a cheatsheet that quickly describes each here, but for a more detailed look at the positives and negatives of each possibility, check out chapter 5 of the book. Here’s a breakdown so far of the options people prefer:

You can learn a lot more about these possible future scenarios — along with fun explanations about what AI is, how it works, how it’s impacting us today, and what else the future might bring — when you order Max’s new book.

The results above will be updated regularly. Please add your voice by taking the survey here, and share your comments below!

Can AI Remain Safe as Companies Race to Develop It?

Click here to see this page in other languages: Chinese 

Race Avoidance Teams developing AI systems should actively cooperate to avoid corner cutting on safety standards.

Artificial intelligence could bestow incredible benefits on society, from faster, more accurate medical diagnoses to more sustainable management of energy resources, and so much more. But in today’s economy, the first to achieve a technological breakthrough are the winners, and the teams that develop AI technologies first will reap the benefits of money, prestige, and market power. With the stakes so high, AI builders have plenty of incentive to race to be first.

When an organization is racing to be the first to develop a product, adherence to safety standards can grow lax. So it’s increasingly important for researchers and developers to remember that, as great as AI could be, it also comes with risks, from unintended bias and discrimination to potential accidental catastrophe. These risks will be exacerbated if teams struggling to develop some product or feature first don’t take the time to properly vet and assess every aspect of their programs and designs.

Yet, though the risk of an AI race is tremendous, companies can’t survive if they don’t compete.

As Elon Musk said recently, “You have companies that are racing – they kind of have to race – to build AI or they’re going to be made uncompetitive. If your competitor is racing toward AI and you don’t, they will crush you.”

 

Is Cooperation Possible?

With signs that an AI race may already be underway, some are worried that cooperation will be hard to achieve.

“It’s quite hard to cooperate,” said AI professor Susan Craw, “especially if you’re trying to race for the product, and I think it’s going to be quite difficult to police that, except, I suppose, by people accepting the principle. For me safety standards are paramount and so active cooperation to avoid corner cutting in this area is even more important. But that will really depend on who’s in this space with you.”

Susan Schneider, a philosopher focusing on advanced AI, added, “Cooperation is very important. The problem is going to be countries or corporations that have a stake in secrecy. … If superintelligent AI is the result of this race, it could pose an existential risk to humanity.”

However, just because something is difficult, that doesn’t mean it’s impossible, and AI philosopher Patrick Lin may offer a glimmer of hope.

“I would lump race avoidance into the research culture. … Competition is good, and an arms race is bad, but how do you get people to cooperate to avoid an arms race? Well, you’ve got to develop the culture first,” Lin suggests, referring to a comment he made in our previous piece on the Research Culture Principle. Lin argued that the AI community lacks cohesion because researchers come from so many different fields.

Developing a cohesive culture is no simple task, but it’s not an insurmountable challenge.

 

Who Matters Most?

Perhaps an important step toward developing an environment that encourages “cooperative competition” is understanding why an organization or a team might risk cutting corners on safety. This is precisely what Harvard psychologist Joshua Greene did as he considered the Principle.

“Cutting corners on safety is essentially saying, ‘My private good takes precedence over the public good,’” Greene said. “Cutting corners on safety is really just an act of selfishness. The only reason to race forward at the expense of safety is if you think that the benefits of racing disproportionately go to you. It’s increasing the probability that people in general will be harmed, a common bad, if you like, in order to raise the probability of a private good.”

 

A Profitable Benefit of Safety

John Havens, Executive Director with the IEEE, says he “couldn’t agree more” with the Principle. He wants to use this as an opportunity to “re-invent” what we mean by safety and how we approach safety standards.

Havens explained, “We have to help people re-imagine what safety standards mean. … By going over safety, you’re now asking: What is my AI system? How will it interact with end users or stakeholders in the supply chain touching it and coming into contact with it, where there are humans involved, where it’s system to human vs. system to system?

“Safety is really about asking about people’s values. It’s not just physical safety, it’s also: What about their personal data, what about how they’re going to interact with this? So the reason you don’t want to cut corners is you’re also cutting innovation. You’re cutting the chance to provide a better product or service.”

But for companies who take these standards seriously, he added, “You’re going to discover all these wonderful ways to build more trust with what you’re doing when you take the time you need to go over those standards.”

 

What Do You Think?

With organizations like the Partnership on AI, we’re already starting to see signs that companies recognize and want to address the dangers of an AI race. But for now, the Partnership is comprised mainly of western organizations, while companies in many countries and especially China are vying to catch up to — and perhaps “beat” — companies in the U.S. and Europe. How can we encourage organizations and research teams worldwide to cooperate and develop safety standards together? How can we help teams to monitor their work and ensure proper safety procedures are always in place? AI research teams will need the feedback and insight of other teams to ensure that they don’t overlook potential risks, but how will this collaboration work without forcing companies to reveal trade secrets? What do you think of the Race Avoidance Principle?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Podcast: The Art of Predicting with Anthony Aguirre and Andrew Critch

How well can we predict the future? In this podcast, Ariel speaks with Anthony Aguirre and Andrew Critch about the art of predicting the future, what constitutes a good prediction, and how we can better predict the advancement of artificial intelligence. They also touch on the difference between predicting a solar eclipse and predicting the weather, what it takes to make money on the stock market, and the bystander effect regarding existential risks.

Anthony is a professor of physics at the University of California at Santa Cruz. He’s one of the founders of the Future of Life Institute, of the Foundational Questions Institute, and most recently of metaculus.com, which is an online effort to crowdsource predictions about the future of science and technology. Andrew is on a two-year leave of absence from MIRI to work with UC Berkeley’s Center for Human Compatible AI. He cofounded the Center for Applied Rationality, and previously worked as an algorithmic stock trader at Jane Street Capital.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: To start, what are predictions? What are the hallmarks of a good prediction? How does that differ from just guessing?

Anthony: I would say there are four aspects to a good prediction. One, it should be specific, well-defined and unambiguous. If you predict something’s going to happen, everyone should agree on whether that thing has happened or not. This can be surprisingly difficult to do.

Second, it should be probabilistic. A really good prediction is a probability for something happening.

Third, a prediction should be precise. If you give everything a 50% chance, you’ll never be terribly wrong, but you’ll also never be terribly right. Predictions are really interesting to the extent that they say something is either very likely or very unlikely. Precision is what we would aim for.

Fourth, you want to be well-calibrated. If there are 100 things that you predict with 90% confidence, around 90% of those things should come true.

The precision and the calibration kind of play off against each other, but it’s very difficult to be both about the future.

Andrew: Of the properties Anthony said, being specific, meaning it’s clear what the prediction is saying and when it will be settled — I think people really don’t appreciate how psychologically valuable that is.

People really undervalue the extent to which the specificity property of prediction is also part of your own training as a predictor. The last property that Anthony said, being calibration, is not just a property of a prediction. It’s a property of a predictor.

A good predictor is somebody who strives for calibration while also trying to be precise and get their probabilities as close to zero and one as they can.

Ariel: What is the difference between prediction versus just guessing or intuition? For example, knowing that the eclipse will happen in August versus not knowing what the weather will be like yet.

Andrew: The problem is that weather data is very unpredictable, and the locations of planets and moons and stars are predictable. I would say that it’s lack of a reliable model for making the prediction or a reliable method.

Anthony: There is an incredibly accurate prediction of the eclipse this coming August, but there is some tiny bit of uncertainty that you don’t see because we know so precisely where the planets are.

When you look at weather, there’s lots of uncertainty because we don’t have some measurement device at every position measuring every temperature and density of the atmosphere and the water at every point on earth. There’s uncertainty in the initial conditions, and then the physics amplifies those initial uncertainties into bigger uncertainties later on. That’s the hallmark of a chaotic physical system, which the atmosphere happens to be.

It’s an interesting thing that the different physical systems are so different in their predictability.

Andrew: That’s a really important thing for people to realize about predicting the future. They see the stock market, how unpredictable it is, and they know the stock market has something to do with the news and with what’s going on in the world. That must mean that the world itself is extremely hard to predict, but I think that’s an error. The reason the stock market is hard to predict is because it is a prediction.

If you’ve already made a prediction, predicting what is wrong about your prediction is really hard — if you knew that, you would have just made that part of your prediction to begin with. That’s something to meditate on. The world is not always as hard to predict as the stock market. I can predict that there’s going to be a traffic jam tomorrow on the commute from the East Bay to San Francisco, between the hours of 6:00 a.m. and 10:00 a.m.

I think some aspects of social systems are actually very easy to predict. An individual human driver, might be very hard to predict. But if you see 10,000 people driving down the highway, you get a strong sense of whether there’s going to be a traffic jam. Sometimes unpredictable phenomena can add up to predictable phenomena, and I think that’s a really important feature of making good long-term predictions with complicated systems.

Anthony: It’s often said that climate is more predictable than weather. Although the individual fluctuations day-to-day are difficult to predict, it’s very easy to predict that, in general, winter in the Northern Hemisphere is going to be colder than the summer. There are lots of statistical regularities that emerge, when you average over large numbers.

Ariel: As we’re trying to understand what the impact of artificial intelligence will be on humanity how do we consider what would be a complex prediction? What’s a simple prediction? What sort of information do we need to do this?

Anthony: Well, that’s a tricky one. One of the best methods of prediction for lots of things is just simple extrapolation. There are many physical systems that, once you can discern if they have a trend, you can fit a pretty simple function to.

When you’re talking about artificial intelligence, there are some hard aspects to predict, but also some relatively easy aspects to predict, like looking at the amount of funding that’s being given to artificial intelligence research or the computing power and computing speed and efficiency, following Moore’s Law and variants of it.

Andrew: People often think of mathematics as a source of certainty, but sometimes you can be certain that you are uncertain or you can be certain that you can’t be certain about something else.

A simple trend, like Moore’s Law, is a summary of what you see from a very complicated system, namely a bunch of companies and a bunch of people working to build smaller and faster and cheaper and more energy efficient hardware. That’s a very complicated system that somehow adds up to fairly simple behavior.

A hallmark of good prediction is, when you find a trend, the first question you should ask yourself is what is giving rise to this trend, and can I expect that to continue? That’s a bit of an art. It’s kind of more art than science, but it’s a critical art, because otherwise we end up blindly following trends that are bound to fail.

Ariel: I want to ask about who is making the prediction. With AI, for example, we see smart people in the field who predict AI will make life great and others are worried. With existential risks we see surveys and efforts in which experts in the field try to predict the odds of human extinction. How much can we rely on “experts in the field”?

Andrew: I can certainly tell you that thinking for 30 consecutive minutes about what could cause human extinction is much more productive than thinking for one consecutive minute. There are hard-to-notice mistakes about human extinction predictions that you probably can’t figure out from 30 seconds of reasoning.

Not everyone who’s an expert, say, in nuclear engineering or artificial intelligence is an expert in reasoning about human extinction. You have to be careful who you call an expert.

Anthony: I also feel that something similar is true about prediction. In general, making predictions is greatly aided if you have domain knowledge and expertise in the thing that you’re making a prediction about, but far from sufficient to make accurate predictions.

One of the experiences I’ve seen running Metaculus, is that there are people that know a tremendous amount about a subject and just are terrible at making predictions about it. Other people, who, even if their actual domain knowledge is lower, the fact that they are comfortable with statistics, that they’ve had practice making predictions are just much, much better at it.

Ariel: Anthony, with Metaculus, one of the things that you’re trying to do is get more people involved in predicting. What is the benefit of more people?

Anthony: There are a few benefits. One is that lots of people get the benefit of practice. Thinking about things that you tend to be more wrong on and what they might correlate with — that’s incredibly useful and makes you more effective.

In terms of actually creating accurate predictions, you’ll have more people who are really good at it. You can figure out who is good at predicting, and who is good at predicting a particular type of thing. One of the interesting things is that it isn’t just luck. There is a skill that people can develop and obtain, and then can be relied upon in the future.

Then, the third, and maybe this is the most important, is just statistics. Aggregating lots of people’s predictions tends to make a more accurate aggregate.

Andrew: I would also just like to say that I think the existence of systems like Metaculus are going to be really important for society improving its ability to understand the world.

Whose job is it to think for a solid hour about a human extinction risk? The answer is almost nobody. So we ought not to expect that just averaging the wisdom of the crowds is going to do super well on answering a question like that.

Ariel: Back to artificial intelligence and the question of timelines. How helpful is it for us to try to make predictions about when things will happen with AI? And who should make those predictions?

Andrew: I have made a career shift to coming up with trying to design control mechanisms for highly intelligent AI. I made that career shift, based on my own personal forecast of the future and what I think will be important, but I don’t reevaluate that forecast every day, just as I don’t reevaluate what neighborhood I should live in every day. You, at some point, need to commit to a path and follow that path for a little while to get anything done.

I think most AI researchers should, at some point, do the mental exercise of mapping out timelines and seeing what needs to happen, but they should do it deeply once every few years in collaboration with a few other people, and then stick to something that they think is going to help steer AI in a positive direction. I see a tendency to too frequently reevaluate timeline analyses of what’s going to happen in AI.

My answer to you is kind of everyone, but not everyone at once.

Anthony: I think there’s one other interesting question, which is the degree to which we want there to be accurate predictions and lots of people know what those accurate predictions are.

In general, I think more information is better, but it’s not necessarily the case that more information is better all the time. Suppose, that I became totally convinced, using Metaculus, that there was a high probability that artificial superintelligence was happening in the next 10 years. That would be a pretty big deal. I’d really want to think through what effect that information would have on various actors, national governments, companies, and so on. It could instigate a lot of issues. Those are things that I think we have to really carefully consider.

Andrew: Yeah, Anthony, I think that’s a great important issue. I don’t think there are enough scientific norms in circulation for what to do with a potentially dangerous discovery. Honestly, I feel like the discourse in most of science is a little bit head in the sand about the feasibility of creating existential risks from technology.

You might think it would be so silly and dumb to have some humans produce some technology that accidentally destroyed life, but just because it’s silly doesn’t mean it won’t happen. It’s the bystander effect. It’s very easy for us to fall into the trap of: “I don’t need to worry about developing dangerous technology, because if I was close to something dangerous, surely someone would have thought that through.”

You have to ask: whose job is it to be worried? If no one in the artificial intelligence community is point on noticing existential threats, maybe no one will notice the existential threats and that will be bad. The same goes for the technology that could be used by bad actors to produce dangerous synthetic viruses.

If you’ve got something that you think is 1% likely to pose an extinction threat, that seems like a small probability. Nonetheless, if 100 people have a 1% chance of causing human extinction, well someone probably has a good chance of doing it.

Ariel: Is there something hopeful that you want to add?

Anthony: Pretty much every decision that we make is implicitly built on a prediction. I think that if we can get better at predicting, individually, as a group, as a society, that should really help us choose a more wise path into the future, and hopefully that can happen.

Andrew: Hear, hear.

Visit metaculus.com to try your hand at the art of predicting.

 

Towards a Code of Ethics in Artificial Intelligence with Paula Boddington

AI promises a smarter world – a world where finance algorithms analyze data better than humans, self-driving cars save millions of lives from accidents, and medical robots eradicate disease. But machines aren’t perfect. Whether an automated trading agent buys the wrong stock, a self-driving car hits a pedestrian, or a medical robot misses a cancerous tumor – machines will make mistakes that severely impact human lives.

Paula Boddington, a philosopher based in the Department of Computer Science at Oxford, argues that AI’s power for good and bad makes it crucial that researchers consider the ethical importance of their work at every turn. To encourage this, she is taking steps to lay the groundwork for a code of AI research ethics.

Codes of ethics serve a role in any field that impacts human lives, such as in medicine or engineering. Tech organizations like the Institute for Electronics and Electrical Engineers (IEEE) and the Association for Computing Machinery (ACM) also adhere to codes of ethics to keep technology beneficial, but no concrete ethical framework exists to guide all researchers involved in AI’s development. By codifying AI research ethics, Boddington suggests, researchers can more clearly frame AI’s development within society’s broader quest of improving human wellbeing.

To better understand AI ethics, Boddington has considered various areas including autonomous trading agents in finance, self-driving cars, and biomedical technology. In all three areas, machines are not only capable of causing serious harm, but they assume responsibilities once reserved for humans. As such, they raise fundamental ethical questions.

“Ethics is about how we relate to human beings, how we relate to the world, how we even understand what it is to live a human life or what our end goals of life are,” Boddington says. “AI is raising all of those questions. It’s almost impossible to say what AI ethics is about in general because there are so many applications. But one key issue is what happens when AI replaces or supplements human agency, a question which goes to the heart of our understandings of ethics.”

 

The Black Box Problem

Because AI systems will assume responsibility from humans – and for humans – it’s important that people understand how these systems might fail. However, this doesn’t always happen in practice.

Consider the Northpointe algorithm that US courts used to predict reoffending criminals. The algorithm weighed 100 factors such as prior arrests, family life, drug use, age and sex, and predicted the likelihood that a defendant would commit another crime. Northpointe’s developers did not specifically consider race, but when investigative journalists from ProPublica analyzed Northpointe, it found that the algorithm incorrectly labeled black defendants as “high risks” almost twice as often as white defendants. Unaware of this bias and eager to improve their criminal justice system, states like Wisconsin, Florida, and New York trusted the algorithm for years to determine sentences. Without understanding the tools they were using, these courts incarcerated defendants based on flawed calculations.

The Northpointe case offers a preview of the potential dangers of deploying AI systems that people don’t fully understand. Current machine-learning systems operate so quickly that no one really knows how they make decisions – not even the people who develop them. Moreover, these systems learn from their environment and update their behavior, making it more difficult for researchers to control and understand the decision-making process. This lack of transparency – the “black box” problem – makes it extremely difficult to construct and enforce a code of ethics.

Codes of ethics are effective in medicine and engineering because professionals understand and have control over their tools, Boddington suggests. There may be some blind spots – doctors don’t know everything about the medicine they prescribe – but we generally accept this “balance of risk.”

“It’s still assumed that there’s a reasonable level of control,” she explains. “In engineering buildings there’s no leeway to say, ‘Oh I didn’t know that was going to fall down.’ You’re just not allowed to get away with that. You have to be able to work it out mathematically. Codes of professional ethics rest on the basic idea that professionals have an adequate level of control over their goods and services.”

But AI makes this difficult. Because of the “black box” problem, if an AI system sets a dangerous criminal free or recommends the wrong treatment to a patient, researchers can legitimately argue that they couldn’t anticipate that mistake.

“If you can’t guarantee that you can control it, at least you could have as much transparency as possible in terms of telling people how much you know and how much you don’t know and what the risks are,” Boddington suggests. “Ethics concerns how we justify ourselves to others. So transparency is a key ethical virtue.”

 

Developing a Code of Ethics

Despite the “black box” problem, Boddington believes that scientific and medical communities can inform AI research ethics. She explains: “One thing that’s really helped in medicine and pharmaceuticals is having citizen and community groups keeping a really close eye on it. And in medicine there are quite a few “maverick” or “outlier” doctors who question, for instance, what the end value of medicine is. That’s one of the things you need to develop codes of ethics in a robust and responsible way.”

A code of AI research ethics will also require many perspectives. “I think what we really need is diversity in terms of thinking styles, personality styles, and political backgrounds, because the tech world and the academic world both tend to be fairly homogeneous,” Boddington explains.

Not only will diverse perspectives account for different values, but they also might solve problems better, according to research from economist Lu Hong and political scientist Scott Page. Hong and Page found that if you compare two groups solving a problem – one homogeneous group of people with very high IQs, and one diverse group of people with lower IQs – the diverse group will probably solve the problem better.

 

Laying the Groundwork

This fall, Boddington will release the main output of her project: a book titled Towards a Code of Ethics for Artificial Intelligence. She readily admits that the book can’t cover every ethical dilemma in AI, but it should help demonstrate how tricky it is to develop codes of ethics for AI and spur more discussion on issues like how codes of professional ethics can deal with the “black box” problem.

Boddington has also collaborated with the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, which recently released a report exhorting researchers to look beyond the technical capabilities of AI, and “prioritize the increase of human wellbeing as our metric for progress in the algorithmic age.”

Although a formal code is only part of what’s needed for the development of ethical AI, Boddington hopes that this discussion will eventually produce a code of AI research ethics. With a robust code, researchers will be better equipped to guide artificial intelligence in a beneficial direction.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Op-ed: Should Artificial Intelligence Be Regulated?

By Anthony Aguirre, Ariel Conn, and Max Tegmark

Should artificial intelligence be regulated? Can it be regulated? And if so, what should those regulations look like?

These are difficult questions to answer for any technology still in development stages – regulations, like those on the food, pharmaceutical, automobile and airline industries, are typically applied after something bad has happened, not in anticipation of a technology becoming dangerous. But AI has been evolving so quickly, and the impact of AI technology has the potential to be so great that many prefer not to wait and learn from mistakes, but to plan ahead and regulate proactively.

In the near term, issues concerning job losses, autonomous vehicles, AI- and algorithmic-decision making, and “bots” driving social media require attention by policymakers, just as many new technologies do. In the longer term, though, possible AI impacts span the full spectrum of benefits and risks to humanity – from the possible development of a more utopic society to the potential extinction of human civilization. As such, it represents an especially challenging situation for would-be regulators.

Already, many in the AI field are working to ensure that AI is developed beneficially, without unnecessary constraints on AI researchers and developers. In January of this year, some of the top minds in AI met at a conference in Asilomar, CA. A product of this meeting was the set of Asilomar AI Principles. These 23 Principles represent a partial guide, its drafters hope, to help ensure that AI is developed beneficially for all. To date, over 1200 AI researchers and over 2300 others have signed on to these principles.

Yet aspirational principles alone are not enough, if they are not put into practice, and a question remains: is government regulation and oversight necessary to guarantee that AI scientists and companies follow these principles and others like them?

Among the signatories of the Asilomar Principles is Elon Musk, who recently drew attention for his comments at a meeting of the National Governors Association, where he called for a regulatory body to oversee AI development. In response, news organizations focused on his concerns that AI represents an existential threat. And his suggestion raised concerns with some AI researchers who worry that regulations would, at best, be unhelpful and misguided, and at worst, stifle innovation and give an advantage to companies overseas.

But an important and overlooked comment by Musk related specifically to what this regulatory body should actually do. He said:

“The right order of business would be to set up a regulatory agency – initial goal: gain insight into the status of AI activity, make sure the situation is understood, and once it is, put regulations in place to ensure public safety. That’s it. … I’m talking about making sure there’s awareness at the government level.”

There is disagreement among AI researchers about what the risk of AI may be, when that risk could arise, and whether AI could pose an existential risk, but few researchers would suggest that AI poses no risk. Even today, we’re seeing signs of narrow AI exacerbating problems of discrimination and job loss, and if we don’t take proper precautions, we can expect problems to worsen, affecting more people as AI grows smarter and more complex.

The number of AI researchers who signed the Asilomar Principles – as well as the open letters regarding developing beneficial AI and opposing lethal autonomous weapons – shows that there is strong consensus among researchers that we need to do more to understand and address the known and potential risks of AI.

Some of the Principles that AI researchers signed directly relate to Musk’s statements, including:

3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and policy-makers.

4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI.

5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards.

20) Importance: Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources.

21) Risks: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact.

The right policy and governance solutions could help align AI development with these principles, as well as encourage interdisciplinary dialogue on how that may be achieved.

The recently founded Partnership on AI, which includes the leading AI industry players, similarly endorses the idea of principled AI development – their founding document states that “where AI tools are used to supplement or replace human decision-making, we must be sure that they are safe, trustworthy, and aligned with the ethics and preferences of people who are influenced by their actions”.

And as Musk suggests, the very first step needs to be increasing awareness about AI’s implications among government officials. Automated vehicles, for example, are expected to eliminate millions of jobs, which will affect nearly every governor who attended the talk (assuming they’re still in office), yet the topic rarely comes up in political discussion.

AI researchers are excited – and rightly so – about the incredible potential of AI to improve our health and well-being: it’s why most of them joined the field in the first place. But there are legitimate concerns about the possible misuse and/or poor design of AI, especially as we move toward advanced and more general AI.

Because these problems threaten society as a whole, they can’t be left to a small group of researchers to address. At the very least, government officials need to learn about and understand how AI could impact their constituents, as well as how more AI safety research could help us solve these problems before they arise.

Instead of focusing on whether regulations would be good or bad, we should lay the foundations for constructive regulation in the future by helping our policy-makers understand the realities and implications of AI progress. Let’s ask ourselves: how can we ensure that AI remains beneficial for all, and who needs to be involved in that effort?