## Can We Properly Prepare for the Risks of Superintelligent AI?

Risks Principle: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact.

We don’t know what the future of artificial intelligence will look like. Though some may make educated guesses, the future is unclear.

AI could keep developing like all other technologies, helping us transition from one era into a new one. Many, if not all, AI researchers hope it could help us transform into a healthier, more intelligent, peaceful society. But it’s important to remember that AI is a tool and, as such, not inherently good or bad. As with any other technology or tool, there could be unintended consequences. Rarely do people actively attempt to crash their cars or smash their thumbs with hammers, yet both happen all the time.

A concern is that as technology becomes more advanced, it can affect more people. A poorly swung hammer is likely to only hurt the person holding the nail. A car accident can harm passengers and drivers in both cars, as well as pedestrians. A plane crash can kill hundreds of people. Now, automation threatens millions of jobs — and while presumably no lives will be lost as a direct result, mass unemployment can have devastating consequences.

And job automation is only the beginning. When AI becomes very general and very powerful, aligning it with human interests will be challenging. If we fail, AI could plausibly become an existential risk for humanity.

Given the expectation that advanced AI will far surpass any technology seen to date — and possibly surpass even human intelligence — how can we predict and prepare for the risks to humanity?

To consider the Risks Principle, I turned to six AI researchers and philosophers.

### Non-zero Probability

An important aspect of considering the risk of advanced AI is recognizing that the risk exists, and it should be taken into account.

As Roman Yampolskiy, an associate professor at the University of Louisville, explained, “Even a small probability of existential risk becomes very impactful once multiplied by all the people it will affect. Nothing could be more important than avoiding the extermination of humanity.”

This is “a very reasonable principle,” said Bart Selman, a professor at Cornell University. He explained, “I sort of refer to some of the discussions between AI scientists who might differ in how big they think that risk is. I’m quite certain it’s not zero, and the impact could be very high. So … even if these things are still far off and we’re not clear if we’ll ever reach them, even with a small probability of a very high consequence we should be serious about these issues. And again, not everybody, but the subcommunity should.”

Anca Dragan, an assistant professor at UC Berkeley was more specific about her concerns. “An immediate risk is agents producing unwanted, surprising behavior,” she explained. “Even if we plan to use AI for good, things can go wrong, precisely because we are bad at specifying objectives and constraints for AI agents. Their solutions are often not what we had in mind.”

### Considering Other Risks

While most people I spoke with interpreted this Principle to address longer-term risks of AI, Dan Weld, a professor at the University of Washington, took a more nuanced approach.

“How could I disagree?” He asked. “Should we ignore the risks of any technology and not take precautions? Of course not. So I’m happy to endorse this one. But it did make me uneasy, because there is again an implicit premise that AI systems have a significant probability of posing an existential risk.”

But then he added, “I think what’s going to happen is – long before we get superhuman AGI – we’re going to get superhuman artificial *specific* intelligence. … These narrower kinds of intelligence are going to be at the superhuman level long before a *general* intelligence is developed, and there are many challenges that accompany these more narrowly described intelligences.”

“One technology,” he continued, “that I wish [was] discussed more is explainable machine learning. Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand *what* it is that the machine learned. And, of course, with deep neural networks it is notoriously difficult to understand what they learned. I think it’s really important for us to develop techniques so machines can explain what they learned so humans can validate that understanding. … Of course, we’ll need explanations before we can trust an AGI, but we’ll need it long before we achieve general intelligence, as we deploy much more limited intelligent systems. For example, if a medical expert system recommends a treatment, we want to be able to ask, ‘Why?’

“Narrow AI systems, foolishly deployed, could be catastrophic. I think the immediate risk is less a function of the intelligence of the system than it is about the system’s autonomy, specifically the power of its effectors and the type of constraints on its behavior. Knight Capital’s automated trading system is much less intelligent than Google Deepmind’s AlphaGo, but the former lost $440 million in just forty-five minutes. AlphaGo hasn’t and can’t hurt anyone. … And don’t get me wrong – I think it’s important to have some people thinking about problems surrounding AGI; I applaud supporting that research. But I do worry that it distracts us from some other situations which seem like they’re going to hit us much sooner and potentially cause calamitous harm.” ### Open to Interpretation Still others I interviewed worried about how the Principle might be interpreted, and suggested reconsidering word choices, or rewriting the principle altogether. Patrick Lin, an Associate Professor at California Polytechnic State University, believed that the Principle is too ambiguous. He explained, “This sounds great in ‘principle,’ but you need to work it out. For instance, it could be that there’s this catastrophic risk that’s going to affect everyone in the world. It could be AI or an asteroid or something, but it’s a risk that will affect everyone. But the probabilities are tiny — 0.000001 percent, let’s say. Now if you do an expected utility calculation, these large numbers are going to break the formula every time. There could be some AI risk that’s truly catastrophic, but so remote that if you do an expected utility calculation, you might be misled by the numbers.” “I agree with it in general,” Lin continued, “but part of my issue with this particular phrasing is the word ‘commensurate.’ Commensurate meaning an appropriate level that correlates to its severity. So I think how we define commensurate is going to be important. Are we looking at the probabilities? Are we looking at the level of damage? Or are we looking at expected utility? The different ways you look at risk might point you to different conclusions. I’d be worried about that. We can imagine all sorts of catastrophic risks from AI or robotics or genetic engineering, but if the odds are really tiny, and you still want to stick with this expected utility framework, these large numbers might break the math. It’s not always clear what the right way is to think about risk and a proper response to it.” Meanwhile Nate Soares, the Executive Director of the Machine Intelligence Research Institute, suggested that the Principle should be more specific. Soares said, “The principle seems too vague. … Maybe my biggest concern with it is that it leaves out questions of tractability: the attention we devote to risks shouldn’t actually be proportional to the risks’ expected impact; it should be proportional to the expected usefulness of the attention. There are cases where we should devote more attention to smaller risks than to larger ones, because the larger risk isn’t really something we can make much progress on. (There are also two separate and additional claims, namely ‘also we should avoid taking actions with appreciable existential risks whenever possible’ and ‘many methods (including the default methods) for designing AI systems that are superhumanly capable in the domains of cross-domain learning, reasoning, and planning pose appreciable existential risks.’ Neither of these is explicitly stated in the principle.) “If I were to propose a version of the principle that has more teeth, as opposed to something that quickly mentions ‘existential risk’ but doesn’t give that notion content or provide a context for interpreting it, I might say something like: ‘The development of machines with par-human or greater abilities to learn and plan across many varied real-world domains, if mishandled, poses enormous global accident risks. The task of developing this technology therefore calls for extraordinary care. We should do what we can to ensure that relations between segments of the AI research community are strong, collaborative, and high-trust, so that researchers do not feel pressured to rush or cut corners on safety and security efforts.’” ### What Do You Think? How can we prepare for the potential risks that AI might pose? How can we address longer-term risks without sacrificing research for shorter-term risks? Human history is rife with learning from mistakes, but in the case of the catastrophic and existential risks that AI could present, we can’t allow for error – but how can we plan for problems we don’t know how to anticipate? AI safety research is critical to identifying unknown unknowns, but is there more the the AI community or the rest of society can do to help mitigate potential risks? This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here. ## The AI Debate Must Stay Grounded in Reality The following article was written by Vincent Conitzer and originally posted in Prospect Magazine. Progress in artificial intelligence has been rapid in recent years. Computer programs are dethroning humans in games ranging from Jeopardy to Go to poker. Self-driving cars are appearing on roads. AI is starting to outperform humans in image and speech recognition. With all this progress, a host of concerns about AI’s impact on human societies have come to the forefront. How should we design and regulate self-driving cars and similar technologies? Will AI leave large segments of the population unemployed? Will AI have unintended sociological consequences? (Think about algorithms that accurately predict which news articles a person will like resulting in highly polarised societies, or algorithms that predict whether someone will default on a loan or commit another crime becoming racially biased due to the input data they are given.) Will AI be abused by oppressive governments to sniff out and stifle any budding dissent? Should we develop weapons that can act autonomously? And should we perhaps even be concerned that AI will eventually become “superintelligent”—intellectually more capable than human beings in every important way—making us obsolete or even extinct? While this last concern was once purely in the realm of science fiction, notable figures including Elon Musk, Bill Gates, and Stephen Hawking, inspired by Oxford philosopher Nick Bostrom’s Superintelligence book, have recently argued it needs to be taken seriously. These concerns are mostly quite distinct from each other, but they all rely on the premise of technical advances in AI. Actually, in all cases but the last one, even just currently demonstrated AI capabilities justify the concern to some extent, but further progress will rapidly exacerbate it. And further progress seems inevitable, both because there do not seem to be any fundamental obstacles to it and because large amounts of resources are being poured into AI research and development. The concerns feed off each other and a community of people studying the risks of AI is starting to take shape. This includes traditional AI researchers—primarily computer scientists—as well as people from other disciplines: economists studying AI-driven unemployment, legal scholars debating how best to regulate self-driving cars, and so on. A conference on “Beneficial AI” held in California in January brought a sizeable part of this community together. The topics covered reflected the diversity of concerns and interests. One moment, the discussion centered on which communities are disproportionately affected by their jobs being automated; the next moment, the topic was whether we should make sure that super-intelligent AI has conscious experiences. The mixing together of such short- and long-term concerns does not sit well with everyone. Most traditional AI researchers are reluctant to speculate about whether and when we will attain truly human-level AI: current techniques still seem a long way off this and it is not clear what new insights would be able to close the gap. Most of them would also rather focus on making concrete technical progress than get mired down in philosophical debates about the nature of consciousness. At the same time, most of these researchers are willing to take seriously the other concerns, which have a concrete basis in current capabilities. Is there a risk that speculation about super-intelligence, often sounding like science fiction more than science, will discredit the larger project of focusing on the societally responsible development of real AI? And if so, is it perhaps better to put aside any discussion of super-intelligence for now? While I am quite sceptical of the idea that truly human-level AI will be developed anytime soon, overall I think that the people worried about this deserve a place at the table in these discussions. For one, some of the most surprisingly impressive recent technical accomplishments have come from people who are very bullish on what AI can achieve. Even if it turns out that we are still nowhere close to human-level AI, those who imagine that we are could contribute useful insights into what might happen in the medium-term. I think there is value even in thinking about some of the very hard philosophical questions, such as whether AI could ever have subjective experiences, whether there is something it would be like to be a highly advanced AI system. (See also my earlier Prospect article.) Besides casting an interesting new light on some ancient questions, the exercise is likely to inform future societal debates. For example, we may imagine that in the future people will become attached to the highly personalised and anthropomorphised robots that care for them in old age, and demand certain rights for these robots after they pass away. Should such rights be granted? Should such sentiments be avoided? At the same time, the debate should obviously not exclude or turn off people who genuinely care about the short-term concerns while being averse to speculation about the long-term, especially because most real AI researchers fall in this last category. Besides contributing solutions to the short-term concerns, their participation is essential to ensure that the longer-term debate stays grounded in reality. Research communities work best when they include people with different views and different sub-interests. And it is hard to imagine a topic for which this is truer than the impact of AI on human societies. Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we post op-eds that we believe will help spur discussion within our community. Op-eds do not necessarily represent FLI’s opinions or views. ## Artificial Intelligence and Income Inequality Shared Prosperity Principle: The economic prosperity created by AI should be shared broadly, to benefit all of humanity. Income inequality is a well recognized problem. The gap between the rich and poor has grown over the last few decades, but it became increasingly pronounced after the 2008 financial crisis. While economists debate the extent to which technology plays a role in global inequality, most agree that tech advances have exacerbated the problem. In an interview with the MIT Tech Review, economist Erik Brynjolfsson said, “My reading of the data is that technology is the main driver of the recent increases in inequality. It’s the biggest factor.” Which begs the question: what happens as automation and AI technologies become more advanced and capable? Artificial intelligence can generate great value by providing services and creating products more efficiently than ever before. But many fear this will lead to an even greater disparity between the wealthy and the rest of the world. AI expert Yoshua Bengio suggests that equality and ensuring a shared benefit from AI could be pivotal in the development of safe artificial intelligence. Bengio, a professor at the University of Montreal, explains, “In a society where there’s a lot of violence, a lot of inequality, [then] the risk of misusing AI or having people use it irresponsibly in general is much greater. Making AI beneficial for all is very central to the safety question.” In fact, when speaking with many AI experts across academia and industry, the consensus was unanimous: the development of AI cannot benefit only the few. ## Broad Agreement “It’s almost a moral principle that we should share benefits among more people in society,” argued Bart Selman, a professor at Cornell University. “I think it’s now down to eight people who have as much as half of humanity. These are incredible numbers, and of course if you look at that list it’s often technology pioneers that own that half. So we have to go into a mode where we are first educating the people about what’s causing this inequality and acknowledging that technology is part of that cost, and then society has to decide how to proceed.” Guruduth Banavar, Vice President of IBM Research, agreed with the Shared Prosperity Principle, but said, “It needs rephrasing. This is broader than AI work. Any AI prosperity should be available for the broad population. Everyone should benefit and everyone should find their lives changed for the better. This should apply to all technology – nanotechnology, biotech – it should all help to make life better. But I’d write it as ‘prosperity created by AI should be available as an opportunity to the broadest population.’” Francesca Rossi, a research scientist at IBM, added, “I think [this principle is] very important. And it also ties in with the general effort and commitment by IBM to work a lot on education and re-skilling people to be able to engage with the new technologies in the best way. In that way people will be more able to take advantage of all the potential benefits of AI technology. That also ties in with the impact of AI on the job market and all the other things that are being discussed. And they are very dear to IBM as well, in really helping people to benefit the most out of the AI technology and all the applications.” Meanwhile, Stanford’s Stefano Ermon believes that research could help ensure greater equality. “It’s very important that we make sure that AI is really for everybody’s benefit,” he explained, “that it’s not just going to be benefitting a small fraction of the world’s population, or just a few large corporations. And I think there is a lot that can be done by AI researchers just by working on very concrete research problems where AI can have a huge impact. I’d really like to see more of that research work done.” ## A Big Challenge “AI is having incredible successes and becoming widely deployed. But this success also leads to a big challenge,” said Dan Weld, a professor at the University of Washington. “[That is] its impending potential to increase productivity to the point where many people may lose their jobs. As a result, AI is likely to dramatically increase income disparity, perhaps more so than other technologies that have come about recently. If a significant percentage of the populace loses employment, that’s going to create severe problems, right? We need to be thinking about ways to cope with these issues, very seriously and soon.” Berkeley professor, Anca Dragan, summed up the problem when she asked, “If all the resources are automated, then who actually controls the automation? Is it everyone or is it a few select people?” “I’m really concerned about AI worsening the effects and concentration of power and wealth that we’ve seen in the last 30 years,” Bengio added. “It’s a real fundamental problem facing our society today, which is the increasing inequality and the fact that prosperity is not being shared around,” explained Toby Walsh, a professor at UNSW Australia. “This is fracturing our societies and we see this in many places, in Brexit, in Trump,” Walsh continued. “A lot of dissatisfaction within our societies. So it’s something that we really have to fundamentally address. But again, this doesn’t seem to me something that’s really particular to AI. I think really you could say this about most technologies. … although AI is going to amplify some of these increasing inequalities. If it takes away people’s jobs and only leaves wealth in the hands of those people owning the robots, then that’s going to exacerbate some trends that are already happening.” Kay Firth-Butterfield, the Executive Director of AI-Austin.org, also worries that AI could exacerbate an already tricky situation. “AI is a technology with such great capacity to benefit all of humanity,” she said, “but also the chance of simply exacerbating the divides between the developed and developing world, and the haves and have nots in our society. To my mind that is unacceptable and so we need to ensure, as Elon Musk said, that AI is truly democratic and its benefits are available to all.” “Given that all the jobs (physical and mental) will be gone, [shared prosperity] is the only chance we have to be provided for,” added University of Louisville professor, Roman Yampolskiy. ## What Do You Think? Given current tech trends, is it reasonable to assume that AI will exacerbate today’s inequality issues? Will this lead to increased AI safety risks? How can we change the societal mindset that currently discourages a greater sharing of wealth? Or is that even a change we should consider? This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here. ## MIRI March 2017 Newsletter Research updates General updates • Why AI Safety?: A quick summary (originally posted during our fundraiser) of the case for working on AI risk, including notes on distinctive features of our approach and our goals for the field. • Nate Soares attended “Envisioning and Addressing Adverse AI Outcomes,” an event pitting red-team attackers against defenders in a variety of AI risk scenarios. • We also attended an AI safety strategy retreat run by the Center for Applied Rationality. News and links ## How Self-Driving Cars Use Probability Even though human drivers don’t consciously think in terms of probabilities, we observe our environment and make decisions based on the likelihood of certain things happening. A driver doesn’t calculate the probability that the sports car behind her will pass her, but through observing the car’s behavior and considering similar situations in the past, she makes her best guess. We trust probabilities because it is the only way to take action in the midst of uncertainty. Autonomous systems such as self-driving cars will make similar decisions based on probabilities, but through a different process. Unlike a human who trusts intuition and experience, these autonomous cars calculate the probability of certain scenarios using data collectors and reasoning algorithms. ### How to Determine Probability Stefano Ermon, a computer scientist at Stanford University, wants to make self-driving cars and autonomous systems safer and more reliable by improving the way they reason probabilistically about their environment. He explains, “The challenge is that you have to take actions and you don’t know what will happen next. Probabilistic reasoning is just the idea of thinking about the world in terms of probabilities, assuming that there is uncertainty.” There are two main components to achieve safety. First, the computer model must collect accurate data, and second, the reasoning system must be able to draw the right conclusions from the model’s data. Ermon explains, “You need both: to build a reliable model you need a lot of data, and then you need to be able to draw the right conclusions based on the model, and that requires the artificial intelligence to think about these models accurately. Even if the model is right, but you don’t have a good way to reason about it, you can do catastrophic things.” For example, in the context of autonomous vehicles, models use various sensors to observe the environment and collect data about countless variables, such as the behavior of the drivers around you, potholes and other obstacles in front of you, weather conditions—every possible data point. A reasoning system then interprets this data. It uses the model’s information to decide whether the driver behind you is dangerously aggressive, if the pothole ahead will puncture your tire, if the rain is obstructing visibility, and the system continuously changes the car’s behavior to respond to these variables. Consider the aggressive driver behind you. As Ermon explains, “Somehow you need to be able to reason about these models. You need to come up with a probability. You don’t know what the car’s going to do but you can estimate, and based on previous behavior you can say this car is likely to cut the line because it has been driving aggressively.” ### Improving Probabilistic Reasoning Ermon is creating strong algorithms that can synthesize all of the data that a model produces and make reliable decisions. As models improve, they collect more information and capture more variables relevant to making these decisions. But as Ermon notes, “the more complicated the model is, the more variables you have, the more complicated it becomes to make the optimal decisions based on the model.” Thus as the data collection expands, the analysis must also improve. The artificial intelligence in these cars must be able to reason with this increasingly complex data. And this reasoning can easily go wrong. “You need to be very precise when computing these probabilities,” Ermon explains. “If the probability that a car cuts into your lane is 0.1, but you completely underestimate it and say it’s 0.01, you might end up making a fatal decision.” To avoid fatal decisions, the artificial intelligence must be robust, but the data must also be complete. If the model collects incomplete data, “you have no guarantee that the number that you get when you run this algorithm has anything to do with the actual probability of that event,” Ermon explains. The model and the algorithm entirely depend on each other to produce the optimal decision. If the model is incomplete and fails to capture the black ice in front of you, no reasoning system will be able to make a safe decision. And even if the model captures the black ice and every other possible variable, if the reasoning system cannot handle the complexity of this data, again the car will fail. ### How Safe Will Autonomous Systems Be? The technology in self-driving cars has made huge leaps lately, and Ermon is hopeful. “Eventually, as computers get better and algorithms get better and the models get better, hopefully we’ll be able to prevent all accidents,” he suggests. However, there are still fundamental limitations on probabilistic reasoning. “Most computer scientists believe that it is impossible to come up with the silver bullet for this problem, an optimal algorithm that is so powerful that it can reason about all sorts of models that you can think about,” Ermon explains. “That’s the key barrier.” But despite this barrier, self-driving cars will soon be available for consumers. Ford, for one, has promised to put its self-driving cars on the road by 2021. And while most computer scientists expect these cars to be far safer than human drivers, their success depends on their ability to reason probabilistically about their environment. As Ermon explains, “You need to be able to estimate these kinds of probabilities because they are the building blocks that you need to make decisions.” This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. ## Is an AI Arms Race Inevitable? AI Arms Race Principle: An arms race in lethal autonomous weapons should be avoided.* Perhaps the scariest aspect of the Cold War was the nuclear arms race. At its peak, the US and Russia held over 70,000 nuclear weapons, only a fraction of which could have killed every person on earth. As the race to create increasingly powerful artificial intelligence accelerates, and as governments increasingly test AI capabilities in weapons, many AI experts worry that an equally terrifying AI arms race may already be under way. In fact, at the end of 2015, the Pentagon requested$12-15 billion for AI and autonomous weaponry for the 2017 budget, and the Deputy Defense Secretary at the time, Robert Work, admitted that he wanted “our competitors to wonder what’s behind the black curtain.” Work also said that the new technologies were “aimed at ensuring a continued military edge over China and Russia.” But the US does not have a monopoly on this technology, and many fear that countries with lower safety standards could quickly pull ahead. Without adequate safety in place, autonomous weapons could be more difficult to control, create even greater risk of harm to innocent civilians, and more easily fall into the hands of terrorists, dictators, reckless states, or others with nefarious intentions. Anca Dragan, an assistant professor at UC Berkeley, described the possibility of such an AI arms race as “the equivalent of very cheap and easily accessible nuclear weapons.” “And that would not fare well for us,” Dragan added. Unlike nuclear weapons, this new class of WMD can potentially target by traits like race or even by what people have liked on social media. ### Lethal Autonomous Weapons Toby Walsh, a professor at UNSW Australia, took the lead on the 2015 autonomous weapons open letter, which calls for a ban on lethal autonomous weapons and has been signed by over 20,000 people. With regard to that letter and the AI Arms Race Principle, Walsh explained: “One reason that I got involved in these discussions is that there are some topics I think are very relevant today, and one of them is the arms race that’s happening amongst militaries around the world already, today. This is going to be very destabilizing. It’s going to upset the current world order when people get their hands on these sorts of technologies. It’s actually stupid AI that they’re going to be fielding in this arms race to begin with and that’s actually quite worrying – that it’s technologies that aren’t going to be able to distinguish between combatants and civilians, and aren’t able to act in accordance with international humanitarian law, and will be used by despots and terrorists and hacked to behave in ways that are completely undesirable. And that’s something that’s happening today.” When asked about his take on this Principle, University of Montreal professor Yoshua Bengio pointed out that he had signed the autonomous weapons open letter, which basically “says it all” about his concerns of a potential AI arms race. ### Details and Definitions In addition to worrying about the risks of a race, Dragan also expressed a concern over “what to do about it and how to avoid it.” “I assume international treaties would have to occur here,” she said. Dragan’s not the only one expecting international treaties. The UN recently agreed to begin formal discussions that will likely lead to negotiations on an autonomous weapons ban or restrictions. However, as with so many things, the devil will be in the details. In reference to an AI arms race, Cornell professor Bart Selman stated, “It should be avoided.” But he also added, “There’s a difference between it ‘should’ be avoided and ‘can’ it be avoided – that may be a much harder question.” Selman would like to see “the same kinds of discussions as there were around atomic weapons or biological weapons, where people actually start to look at the tradeoffs and the risks of an arms race.” “That discussion has to be had,” he said, “and it may actually bring people together in a positive way. Countries could get together and say this is not a good development and we should limit it and avoid it. So to bring it out as a principle, I think the main value there is that we need to have the discussion as a society and with other countries.” Dan Weld, a professor at the University of Washington, also worries that simply saying an arms race should be avoided is insufficient. “I fervently hope we don’t see an arms race in lethal autonomous weapons,” Weld explained. “That said, this principle bothered me, because it doesn’t seem to have any operational form. Specifically, an arms race is a dynamic phenomenon that happens when you’ve got multiple agents interacting. It takes two people to race. So whose fault is it if there is a race? I’m worried that both participants will point a finger at the other and say, ‘Hey, I’m not racing! Let’s not have a race, but I’m going to make my weapons more accurate and we can avoid a race if you just relax.’ So what force does the principle have?” ### General Consensus Though preventing an AI arms race may be tricky, there seems to be general consensus that a race would be bad and should be avoided. “Weaponized AI is a weapon of mass destruction and an AI arms race is likely to lead to an existential catastrophe for humanity,” said Roman Yampolskiy, a professor at the University of Louisville. Kay Firth-Butterfield, the Executive Director of AI-Austin.org, explained, “Any arms race should be avoided but particularly this one where the stakes are so high and the possibility of such weaponry, if developed, being used within domestic policing is so terrifying.” But Stanford professor Stefano Ermon may have summed it up best when he said, “Even just with the capabilities we have today it’s not hard to imagine how [AI] could be used in very harmful ways. I don’t want my contributions to the field and any kind of techniques that we’re all developing to do harm to other humans or to develop weapons or to start wars or to be even more deadly than what we already have.” ### What do you think? Is an AI arms race inevitable? How can it be prevented? Can we keep autonomous weapons out of the hands of dictators and terrorists? How can companies and governments work together to build beneficial AI without allowing the technology to be used to create what could be the deadliest weapons the world has ever seen? This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here. *The AI Arms Race Principle specifically addresses lethal autonomous weapons. Later in the series, we’ll discuss the Race Avoidance Principle which will look at the risks of companies racing to creating AI technology. ## Using Machine Learning to Address AI Risk The following article and talk are by Jessica Taylor and they were originally posted on MIRI. At the EA Global 2016 conference, I gave a talk on “Using Machine Learning to Address AI Risk”: It is plausible that future artificial general intelligence systems will share many qualities in common with present-day machine learning systems. If so, how could we ensure that these systems robustly act as intended? We discuss the technical agenda for a new project at MIRI focused on this question. A recording of my talk is now up online: The talk serves as a quick survey (for a general audience) of the kinds of technical problems we’re working on under the “Alignment for Advanced ML Systems” research agenda. Included below is a version of the talk in blog post form.1 ### Talk outline: 3.1. KWIK learning ### Goal of this research agenda This talk is about a new research agenda aimed at using machine learning to make AI systems safe even at very high capability levels. I’ll begin by summarizing the goal of the research agenda, and then go into more depth on six problem classes we’re focusing on. The goal statement for this technical agenda is that we want to know how to train a smarter-than-human AI system to perform one or more large-scale, useful tasks in the world. Some assumptions this research agenda makes: 1. Future AI systems are likely to look like more powerful versions of present-day ML systems in many ways. We may get better deep learning algorithms, for example, but we’re likely to still be relying heavily on something like deep learning.2 2. Artificial general intelligence (AGI) is likely to be developed relatively soon (say, in the next couple of decades).3 3. Building task-directed AGI is a good idea, and we can make progress today studying how to do so. I’m not confident that all three of these assumptions are true, but I think they’re plausible enough to deserve about as much attention from the AI community as the likeliest alternative scenarios. A task-directed AI system is a system that pursues a semi-concrete objective in the world, like “build a million houses” or “cure cancer.” For those who have read Superintelligence, task-directed AI is similar to the idea of genie AI. Although these tasks are kind of fuzzy — there’s probably a lot of work you’d need to do to clarify what it really means to build a million houses, or what counts as a good house — they’re at least somewhat concrete. An example of an AGI system that isn’t task-directed would be one with a goal like “learn human values and do things humans would consider good upon sufficient reflection.” This is too abstract to count as a “task” in the sense we mean; it doesn’t directly cash out in things in the world. The hope is that even though task-directed AI pursues a less ambitious objective then “learn human values and do what we’d want it to do,” it’s still sufficient to prevent global catastrophic risks. Once the immediate risks are averted, we can then work on building more ambitious AI systems under reduced time pressure. Task-directed AI uses some (moderate) amount of human assistance to clarify the goal and to evaluate and implement its plans. A goal like “cure cancer” is vague enough that humans will have to do some work to clarify what they mean by it, though most of the intellectual labor should be coming from the AI system rather than from humans. Ideally, task-directed AI also shouldn’t require significantly more computational resources than competing systems. You shouldn’t get an exponential slowdown from building a safe system vs. a generic system. In order to think about this overall goal, we need some kind of model for these future systems. The general approach that I’ll take is to look at current systems and imagine that they’re more powerful. A lot of the time you can look at tasks that people do in ML and you can see that the performance improves over time. We’ll model more advanced AI systems by just supposing that systems will continue to achieve higher scores in ML tasks. We can then ask what kinds of failure modes are likely to arise as systems improve, and what we can work on today to make those failures less likely or less costly. ### Six potential problems with highly capable AI systems ##### Problem 1: Actions are hard to evaluate Suppose an AI system composes a story, and a human gives the system a reward based on how good the story is.4 This is similar to some RL tasks: the agent wants to do something that will cause it to receive a high reward in the future. The formalism of RL would say that the objective of this RL agent is to write a story that the human is expected to give a high score to. For this objective to actually help us receive very high-quality stories, however, we also need to know that the human understands the RL agent’s actions well enough to correctly administer rewards. This assumption seems less likely to hold for systems that are optimizing the objective much more powerfully than any present-day system. For example: • A system much smarter than a human may be able to manipulate or coerce the human into giving a bad story a high score. • Even if the system is less intelligent than that, it might resort to plagiarism. Plagiarism can be easier to generate than to detect, since detection often requires scouring a larger pool of source texts. • A subhuman system might also have an advantage in inserting steganography into the story; it might take polynomial time to embed a secret message, and exponential time to detect such a message. Finding a way to discourage agents from taking covert actions like these would make it easier to monitor those actions’ effects and keep operators in the loop. Do we have a general way of preventing this? Can we train an RL system to not only output an action (e.g., a story), but also a report that might help an overseer better evaluate the system’s performance? Following OpenAI researcher Paul Christiano, we call this the problem of informed oversight.5 ##### Problem 2: Ambiguous test examples Another problem: Consider a classifier trained to distinguish images of cats from images not containing cats, or trained to detect cancer. You may have lots of life experience that tells you “wild cats are cats.” If the training set only contains images of house cats and dogs, however, then it may not be possible to infer this fact during training. An AI system that was superhumanly good at classifying images from a particular data set might not construct the same generalizations as a human, making it unreliable in new environments. In safety-critical settings, ideally we would like the classifier to say, “This is ambiguous,” to alert us that the image’s label is underdetermined by the labels of training set images. We could then leverage the classifier’s proficiency at classification to intervene in contexts where the system is relatively likely to misclassify things, and could also supply training data that’s tailored to the dimensions along which the original data was uninformative. Formalizing this goal is the problem of inductive ambiguity detection. ##### Problem 3: Difficulty imitating human behavior One way we might want to deal with the problems of smarter-than-human AI is to just have it imitate a human. By having the system observe a human attempting to perform some task and having it learn to behave similarly, we might be able to leverage the AI system’s cognitive advantages while having it mostly stick to fairly normal strategies. As an example, we can consider an AI system tasked with producing the kind of picture a human would draw. How could you actually train a system on this objective? In one approach, generative adversarial networks, you have a human and two AI agents: an imitator, and a distinguisher.6The distinguisher tries to guess whether a given picture came from the human or from the imitator, and the imitator tries to produce the kinds of image that the distinguisher classifies as coming from the human. This is a pretty cool setup, but it raises some theoretical issues. We could ask the question, “Does the distinguisher have to be smarter than the imitator? And if so, how much smarter?” If the distinguisher is less smart than the imitator, then the imitator could covertly do very non-human-like things to the image, as long as they’re too subtle for the distinguisher to notice. An open avenue of research here is to analyze schemes like this and ask under what conditions we can get nice robustness guarantees. For more on this problem, see Paul Christiano’s “Act-Based Agents” and “Mimicry and Meeting Halfway”. ##### Problem 4: Difficulty specifying goals about the real world What if we want to train our AI system to perform a task like making a sandwich? One option might be to hook the system up to a robot in a room full of sandwich ingredients, have it perform an action sequence, and then have a human observer rate the robot’s performance based on how close it came to making a sandwich. That rating determines the robot’s reward. We previously noted that sufficiently capable RL agents might pick actions that are hard to evaluate. Here we face the additional problem that useful tasks will often require taking physical action in the world. If the system is capable enough, then this setup gives it an incentive to take away the reward button and press it itself. This is what the formalism of RL would tell you is the best action, if we imagine AI systems that continue to be trained in the RL framework far past current capability levels. A natural question, then, is whether we can train AI systems that just keep getting better at producing a sandwich as they improve in capabilities, without ever reaching a tipping point where they have an incentive to do something else. Can we avoid relying on proxies for the task we care about, and just train the system to value completing the task in its own right? This is the generalizable environmental goals problem. ##### Problem 5: Negative side-effects Suppose we succeeded in making a system that wants to put a sandwich in the room. In choosing between plans, it will favor whichever plan has the higher probability of resulting in a sandwich. Perhaps the policy of just walking over and making a sandwich has a 99.9% chance of success; but there’s always a chance that a human could step in and shut off the robot. A policy that drives down the probability of interventions like that might push up the probability of the room ending up containing a sandwich to 99.9999%. In this way, sufficiently advanced ML systems can end up with incentives to interfere with their developers and operators even when there’s no risk of reward hacking. This is the problem of designing task-directed systems that can become superhumanly good at achieving their task, without causing negative side-effects in the process. One response to this problem is to try to quantify how much total impact different policies have on the world. We can then add a penalty term for actions that have a high impact, causing the system to favor low-impact strategies. Another approach is to ask how we might design an AI system to be satisfied with a merely 99.9% chance of success — just have the system stop trying to think up superior policies once it finds one meeting that threshold. This is the problem of formalizing mild optimization. Or one can consider advanced AI systems from the perspective of convergent instrumental strategies. No matter what the system is trying to do, it can probably benefit by having more computational resources, by having the programmers like it more, by having more money. A sandwich-making system might want money so it can buy more ingredients, whereas a story-writing system might want money so it can buy books to learn from. Many different goals imply similar instrumental strategies, a number of which are likely to introduce conflicts due to resource limitations. One, approach, then, would be to study these instrumental strategies directly and try to find a way to design a system that doesn’t exhibit them. If we can identify common features of these strategies, and especially of the adversarial strategies, then we could try to proactively avert the incentives to pursue those strategies. This seems difficult, and is very underspecified, but there’s some initial research pointed in this direction. ##### Problem 6: Edge cases that still satisfy the goal Another problem that’s likely to become more serious as ML systems become more advanced is edge cases. Consider our ordinary concept of a sandwich. There are lots of things that technically count as sandwiches, but are unlikely to have the same practical uses a sandwich normally has for us. You could have an extremely small or extremely large sandwich, or a toxic sandwich. For an example of this behavior in present-day systems, we can consider this image that an image classifier correctly classified as a panda (with 57% confidence). Goodfellow, Shlens, and Szegedy found that they could add a tiny vector to this image that causes the classifier to misclassify it as a gibbon with 99% confidence.7 Such edge cases are likely to become more common and more hazardous as ML systems begin to search wider solution spaces than humans are likely (or even able) to consider. This is then another case where systems might become increasingly good at maximizing their score on a conventional metric, while becoming less reliable for achieving realistic goals we care about. Conservative concepts are an initial idea for trying to address this problem, by biasing systems to avoid assigning positive classifications to examples that are near the edges of the search space. The system might then make the mistake of thinking that some perfectly good sandwiches are inadmissible, but it would not make the more risky mistake of classifying toxic or otherwise bizarre sandwiches as admissible. ### Technical details on one problem: inductive ambiguity identification I’ve outlined eight research directions for addressing six problems that seem likely to start arising (or to become more serious) as ML systems become better at optimizing their objectives — objectives that may not exactly match programmers’ intentions. The research directions were: These problems are discussed in more detail in “Alignment for Advanced ML Systems.” I’ll go into more technical depth on an example problem to give a better sense of what working on these problems looks like in practice. ##### KWIK learning Let’s consider the inductive ambiguity identification problem, applied to a classifier for 2D points. In this case, we have 4 positive examples and 4 negative examples. When a new point comes in, the classifier could try to label it by drawing a whole bunch of models that are consistent with the previous data. Here, I draw just 4 of them. The question mark falls on opposite sides of these different models, suggesting that all of these models are plausible given the data. We can suppose that the system infers from this that the training data is ambiguous with respect to the new point’s classification, and asks the human to label it. The human might then label it with a plus, and the system draws new conclusions about which models are plausible. This approach is called “Knows What It Knows” learning, or KWIK learning. We start with some input space X ≔ ℝn and assume that there exists some true mapping from inputs to probabilities. E.g., for each image the cat classifier encounters we assume that there is a true answer in the set Y ≔ [0,1] to the question, “What is the probability that this image is a cat?” This probability corresponds to the probability that a human will label that image “1” as opposed to “0,” which we can represent as a weighted coin flip. The model maps the inputs to answers, which in this case are probabilities.8 The KWIK learner is going to play a game. At the beginning of the game, some true model h* gets picked out. The true model is assumed to be in the hypothesis set H. On each iteration i some new example xi ∈ ℝncomes in. It has some true answer yi = h*(xi), but the learner is unsure about the true answer. The learner has two choices: • Output an answer ŷi ∈ [0,1]. • If |ŷiyi| > ε, the learner then loses the game. • Output ⊥ to indicate that the example is ambiguous. • The learner then gets to observe the true label zi = FlipCoin(yi) from the observation set Z ≔ {0,1}. The goal is to not lose, and to not output ⊥ too many times. The upshot is that it’s actually possible to win this game with a high probability if the hypothesis class H is a small finite set or a low-dimensional linear class. This is pretty cool. It turns out that there are certain forms of uncertainty where we can just resolve the ambiguity. The way this works is that on each new input, we consider multiple models h that have done well in the past, and we consider something “ambiguous” if the models disagree on h(xi) by more than ε. Then we just refine the set of models over time. The way that a KWIK learner represents this notion of inductive ambiguity is: ambiguity is about not knowing which model is correct. There’s some set of models, many are plausible, and you’re not sure which one is the right model. There are some problems with this. One of the main problems is KWIK learning’s realizability assumption — the assumption that the true model h* is actually in the hypothesis set H. Realistically, the actual universe won’t be in your hypothesis class, since your hypotheses need to fit in your head. Another problem is that this method only works for these very simple model classes. ##### A Bayesian view of the problem That’s some existing work on inductive ambiguity identification. What’s some work we’ve been doing at MIRI related to this? Lately, I’ve been trying to approach this problem from a Bayesian perspective. On this view, we have some kind of prior Q over mappings X → {0,1} from the input space to the label. The assumption we’ll make is that our prior is wrong in some way and there’s some unknown “true” prior P over these mappings. The goal is that even though the system only has access to Q, it should perform the classification task almost as well (in expectation over P) as if it already knew P. It seems like this task is hard. If the real world is sampled from P, and P is different from your prior Q, there aren’t that many guarantees. To make this tractable, we can add a grain of truth assumption: f:Q(f)1kP(f) This says that if P assigns a high probability to something, then so does Q. Can we get good performance in various classification tasks under this kind of assumption? We haven’t completed this research avenue, but initial results suggest that it’s possible to do pretty well on this task while avoiding catastrophic behaviors in at least in some cases (e.g., online supervised learning). That’s somewhat promising, and this is definitely an area for future research. How this ties in to inductive ambiguity identification: If you’re uncertain about what’s true, then there are various ways of describing what that uncertainty is about. You can try taking your beliefs and partitioning them into various possibilities. That’s in some sense an ambiguity, because you don’t know which possibility is correct. We can think of the grain of truth assumption as saying that there’s some way of splitting up your probability distribution into components such that one of the components is right. The system should do well even though it doesn’t initially know which component is right. (For more recent work on this problem, see Paul Christiano’s “Red Teams” and “Learning with Catastrophes” and research forum results from me and Ryan Carey: “Bias-Detecting Online Learners” and “Adversarial Bandit Learning with Catastrophes.”) ### Other research agendas Let’s return to a broad view and consider other research agendas focused on long-run AI safety. The first such agenda was outlined in MIRI’s 2014 agent foundations report.9 The agent foundations agenda is about developing a better theoretical understanding of reasoning and decision-making. An example of a relevant gap in our current theories is ideal reasoning about mathematical statements (including statements about computer programs), in contexts where you don’t have the time or compute to do a full proof. This is the basic problem we’re responding to in “Logical Induction.” In this talk I’ve focused on problems for advanced AI systems that broadly resemble present-day ML; in contrast, the agent foundations problems are agnostic about the details of the system. They apply to ML systems, but also to other possible frameworks for good general-purpose reasoning. Then there’s the “Concrete Problems in AI Safety” agenda.10 Here the idea is to study AI safety problems with a more empirical focus, specifically looking for problems that we can study using current ML methods, and perhaps can even demonstrate in current systems or in systems that might be developed in the near future. As an example, consider the question, “How do you make an RL agent that behaves safely while it’s still exploring its environment and learning about how the environment works?” It’s a question that comes up in current systems all the time, and is relatively easy to study today, but is likely to apply to more capable systems as well. These different agendas represent different points of view on how one might make AI systems more reliable in a way that scales with capabilities progress, and our hope is that by encouraging work on a variety of different problems from a variety of different perspectives, we’re less likely to completely miss a key consideration. At the same time, we can achieve more confidence that we’re on the right track when relatively independent approaches all arrive at similar conclusions. I’m leading the team at MIRI that will be focusing on the “Alignment for Advanced ML Systems” agenda going forward. It seems like there’s a lot of room for more eyes on these problems, and we’re hoping to hire a number of new researchers and kick off a number of collaborations to tackle these problems. If you’re interested in these problems and have a solid background in mathematics or computer science, I definitely recommend getting in touch or reading more about these problems. 1. I also gave a version of this talk at the MIRI/FHI Colloquium on Robust and Beneficial AI. 2. Alternatively, you may think that AGI won’t look like modern ML in most respects, but that the ML aspects are easier to productively study today and are unlikely to be made completely irrelevant by future developments. 3. Alternatively, you may think timelines are long, but that we should focus on scenarios with shorter timelines because they’re more urgent. 4. Although I’ll use the example of stories here, in real life it could be a system generating plans for curing cancers, and humans evaluating how good the plans are. 5. See the Q&A section of the talk for questions like “Won’t the report be subject to the same concerns as the original story?” 6. Ian J. Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Information Processing 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, pp. 2672-2680. URL: https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf 7. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and Harnessing Adversarial Examples”. In: (2014). arXiv: 1412.6572 [stat.ML] 8. The KWIK learning framework is much more general than this; I’m just giving one example. 9. Nate Soares and Benja Fallenstein. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda. Tech. rep. 2014-8. Forthcoming 2017 in “The Technological Singularity: Managing the Journey” Jim Miller, Roman Yampolskiy, Stuart J. Armstrong, and Vic Callaghan, Eds. Berkeley, CA. Machine Intelligence Research Institute. 2014. 10. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. “Concrete Problems in AI Safety”. In: (2016). arXiv: 1606.06565 [cs.AI] ## Preparing for the Biggest Change in Human History Importance Principle: Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. In the history of human progress, a few events have stood out as especially revolutionary: the intentional use of fire, the invention of agriculture, the industrial revolution, possibly the invention of computers and the Internet. But many anticipate that the creation of advanced artificial intelligence will tower over these achievements. In a popular post, Tim Urban with Wait But Why wrote that artificial intelligence is “by far THE most important topic for our future. Or, as AI professor Roman Yampolskiy told me, “Design of human-level AI will be the most impactful event in the history of humankind. It is impossible to over-prepare for it.” The Importance Principle encourages us to plan for what could be the greatest “change in the history of life.” But just what are we preparing for? What will more advanced AI mean for society? I turned to some of the top experts in the field of AI to consider these questions. ### Societal Benefits? Guruduth Banavar, the Vice President of IBM Research, is hopeful that as AI advances, it will help humanity advance as well. In favor of the principle, he said, “I strongly believe this. I think this goes back to evolution. From the evolutionary point of view, humans have reached their current level of power and control over the world because of intelligence. … AI is augmented intelligence – it’s a combination of humans and AI working together. And this will produce a more productive and realistic future than autonomous AI, which is too far out. In the foreseeable future, augmented AI – AI working with people – will transform life on the planet. It will help us solve the big problems like those related to the environment, health, and education.” “I think I also agreed with that one,” said Bart Selman, a professor at Cornell University. “Maybe not every person on earth should be concerned about it, but there should be, among scientists, a discussion about these issues and a plan – can you build safety guidelines to work with value alignment work? What can you actually do to make sure that the developments are beneficial in the end?” Anca Dragan, an assistant professor at UC Berkeley, explained, “Ultimately, we work on AI because we believe it can have a strong positive impact on the world. But the more capable the technology becomes, the easier it becomes to misuse it – or perhaps, the effects of misusing it become more drastic. That is why it is so important, as we make progress, to start thinking more strongly about what role AI will play.” ### Short-term Concerns Though the Importance Principle specifically mentions advanced AI, some of the researchers I interviewed pointed out that nearer-term artificial intelligence could also drastically impact humanity. “I believe that AI will create profound change even before it is ‘advanced’ and thus we need to plan and manage growth of the technology,” explained Kay Firth-Butterfield, Executive Director of AI-Austin.org. “As humans, we are not good at long-term planning because our civil systems don’t encourage it, however, this is an area in which we must develop our abilities to ensure a responsible and beneficial partnership between man and machine.” Stefano Ermon, an assistant professor at Stanford University, also considered the impacts of less advanced AI, saying, “It’s an incredibly powerful technology. I think it’s even hard to imagine what one could do if we are able to develop a strong AI, but even before that, well before that, the capabilities are really huge. We’ve seen the kind of computers and information technologies we have today, the way they’ve revolutionized our society, our economy, our everyday lives. And my guess is that AI technologies would have the potential to be even more impactful and even more revolutionary on our lives. And so I think it’s going to be a big change and it’s worth thinking very carefully about, although it’s hard to plan for it.” In a follow up question about planning for AI over the shorter term, Selman added, “I think the effect will be quite dramatic. This is another interesting point – sometimes AI scientists say, well it might not be advanced AI will do us in, but dumb AI. … The example is always the self-driving car has no idea it’s driving you anywhere. It doesn’t even know what driving is. … If you looked at the videos of an accident that’s going to happen, people are so surprised that the car doesn’t hit the brakes at all, and that’s because the car works quite differently than humans. So I think there is some short-term [AI] risk in that … we actually think they’re smarter than they are. And I think that will actually go away when the machines become smarter, but for now…” ### Learning From Experience As revolutionary as advanced AI might be, we can still learn from previous technological revolutions and draw on their lessons to prepare for the changes ahead. Toby Walsh, a guest professor at Technical University of Berlin, expressed a common criticism of the principles, arguing that the Importance Principle could – and probably should – apply to many “groundbreaking technologies.” He explained, “This is one of those principles where I think you could put any society-changing technology in place of advanced AI. … It would be true of the steam engine, in some sense it’s true of social media and we’ve failed at that one, it could be true of the Internet but we failed at planning that well. It could be true of fire too, but we failed on that one as well and used it for war. But to get back to the observation that some of them are things that are not particular to AI – once you realize that AI is going to be groundbreaking, then all of the things that should apply to any groundbreaking technology should apply.” By looking back at these previous revolutionary technologies and understanding their impacts, perhaps we can gain insight into how we can plan ahead for advanced AI. Dragan was also interested more explicit solutions to the problem of planning ahead. “As the AI capabilities advance,” she told me, “we have to take a step back and ask ourselves: are we solving the right problem? Is there a better problem definition that will more likely result in benefits to humanity? “For instance, we have always defined AI agents as rational. That means they maximize expected utility. Thus far, utility is assumed to be known. But if you think about it, there is no gospel specifying utility. We are assuming that some *person* somewhere will know exactly what utility to specify for their agent. Well, it turns out, we don’t work like that: it is really hard for people, including AI experts, to specify utility functions. We try our best, but when the system goes ahead and optimizes for what we inputted, the result is sometimes surprising, and not in a good way. This suggests that our definition of an AI agent is predicated on a wrong assumption. We’ve already started seeing that in robotics – the definition of how a robot should move didn’t account for people, the definition of how a robot should learn from demonstration assumed that people can provide perfect demonstrations to a robot, etc. – I assume we are going to see this more and more in AI as a whole. We have to stop making implicit assumptions about people and end-users of AI, and rigorously tackle that head-on, putting people into the equation.” ### What Do You Think? What kind of impact will advanced AI have on the development of human progress? How can we prepare for such potentially tremendous changes? Can we prepare? What other questions do we, as a society, need to ask? This article is part of a weekly series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here. ## Making Deep Learning More Robust Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats. But a computer, one specifically designed to work like the human mind, could. Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is. Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance. Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust. To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life. “Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.” The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm. To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable. In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%. The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment. Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.” Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions. Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle. “To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.” Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos. “There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant. Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future. Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms. Li’s most recent update on this work can be found here. This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. ## How Smart Can AI Get? Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities. A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence. The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here. ### Capability Caution One of the greatest questions facing AI researchers is: just how smart and capable can artificial intelligence become? In recent years, the development of AI has accelerated in leaps and bounds. DeepMind’s AlphaGo surpassed human performance in the challenging, intricate game of Go, and the company has created AI that can quickly learn to play Atari video games with much greater prowess than a person. We’ve also seen breakthroughs and progress in language translation, self-driving vehicles, and even the creation of new medicinal molecules. But how much more advanced can AI become? Will it continue to excel only in narrow tasks, or will it develop broader learning skills that will allow a single AI to outperform a human in most tasks? How do we prepare for an AI more intelligent than we can imagine? Some experts think human-level or even super-human AI could be developed in a matter of a couple decades, while some don’t think anyone will ever accomplish this feat. The Capability Caution Principle argues that, until we have concrete evidence to confirm what an AI can someday achieve, it’s safer to assume that there are no upper limits – that is, for now, anything is possible and we need to plan accordingly. ### Expert Opinion The Capability Caution Principle drew both consensus and disagreement from the experts. While everyone I interviewed generally agreed that we shouldn’t assume upper limits for AI, their reasoning varied and some raised concerns. Stefano Ermon, an assistant professor at Stanford and Roman Yampolskiy, an associate professor at the University of Louisville, both took a better-safe-than-sorry approach. Ermon turned to history as a reminder of how difficult future predictions are. He explained, “It’s always hard to predict the future. … Think about what people were imagining a hundred years ago, about what the future would look like. … I think it would’ve been very hard for them to imagine what we have today. I think we should take a similar, very cautious view, about making predictions about the future. If it’s extremely hard, then it’s better to play it safe.” Yampolskiy considered current tech safety policies, saying, “In many areas of computer science such as complexity or cryptography the default assumption is that we deal with the worst case scenario. Similarly, in AI Safety we should assume that AI will become maximally capable and prepare accordingly. If we are wrong we will still be in great shape.” Dan Weld, a professor at the University of Washington, said of the principle, “I agree! As a scientist, I’m against making strong or unjustified assumptions about anything, so of course I agree.” But though he agreed with the basic idea behind the principle, Weld also had reservations. “This principle bothers me,” Weld explained, “… because it seems to be implicitly saying that there is an immediate danger that AI is going to become superhumanly, generally intelligent very soon, and we need to worry about this issue. This assertion … concerns me because I think it’s a distraction from what are likely to be much bigger, more important, more near-term, potentially devastating problems. I’m much more worried about job loss and the need for some kind of guaranteed health-care, education and basic income than I am about Skynet. And I’m much more worried about some terrorist taking an AI system and trying to program it to kill all Americans than I am about an AI system suddenly waking up and deciding that it should do that on its own.” Looking at the problem from a different perspective, Guruduth Banavar, the Vice President of IBM Research, worries that placing upper bounds on AI capabilities could limit the beneficial possibilities. Banavar explained, “The general idea is that intelligence, as we understand it today, is ultimately the ability to process information from all possible sources and to use that to predict the future and to adapt to the future. It is entirely in the realm of possibility that machines can do that. … I do think we should avoid assumptions of upper limits on machine intelligence because I don’t want artificial limits on how advanced AI can be.” IBM research scientist Francesca Rossi considered this principle from yet another perspective, suggesting that AI is necessary for humanity to reach our full capabilities, where we also don’t want to assume upper limits. “I personally am for building AI systems that augment human intelligence instead of replacing human intelligence,” said Rossi, “And I think that in that space of augmenting human intelligence there really is a huge potential for AI in making the personal and professional lives of everybody much better. I don’t think that there are upper limits of the future AI capabilities in that respect. I think more and more AI systems together with humans will enhance our kind of intelligence, which is complementary to the kind of intelligence that machines have, and will help us make better decisions, and live better, and solve problems that we don’t know how to solve right now. I don’t see any upper limit to that.” ### What do you think? Is there an upper limit to artificial intelligence? Is there an upper limit to what we can achieve with AI? How long will it take to achieve increasing levels of advanced AI? How do we plan for the future with such uncertainties? How can society as a whole address these questions? What other questions should we be asking about AI capabilities? ## MIRI February 2017 Newsletter Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are pursuing different research directions, Jessica Taylor has written up the key motivations for MIRI’s highly reliable agent design research. Research updates General updates • We attended the Future of Life Institute’s Beneficial AI conference at Asilomar. See Scott Alexander’s recap. MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at 22:00). • MIRI staff and a number of top AI researchers are signatories on FLI’s new Asilomar AI Principles, which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI. • The Center for Applied Rationality recounts MIRI researcher origin stories and other cases where their workshops have been a big assist to our work, alongside examples of CFAR’s impact on other groups. • The Open Philanthropy Project has awarded a32,000 grant to AI Impacts.
• Andrew Critch spoke at Princeton’s ENVISION conference (video).
• Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, a reply to “Superintelligence: The Idea That Eats Smart People.”
• The audio version of Rationality: From AI to Zombies is temporarily unavailable due to the shutdown of Castify. However, fans are already putting together a new free recording of the full collection.

• An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
• Also from Asilomar: Russell on corrigibility (video), Bostrom on openness in AI (video), and LeCun on the path to general AI (video).
• From MIT Technology Review‘s “AI Software Learns to Make AI Software”:
Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

## The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?

### Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code. In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system? ### Understanding Autonomous Trading Agents Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment. With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.” In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work. ### Market Manipulation Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets. Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at$30 in one market and $32 in a second market, an agent can buy the$30 stock and immediately sell it for $32 in the second market, making a$2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.

### Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

## Can We Ensure Privacy in the Era of Big Data?

Personal Privacy Principle: People should have the right to access, manage and control the data they generate, given AI systems’ power to analyze and utilize that data.

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

The 23 Asilomar AI Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the weekly discussions about previous principles here.

### Personal Privacy

In the age of social media and online profiles, maintaining privacy is already a tricky problem. As companies collect ever-increasing quantities of data about us, and as AI programs get faster and more sophisticated at analyzing that data, our information can become both a commodity for business and a liability for us.

We’ve already seen small examples of questionable data use, such as Target recognizing a teenager was pregnant before her family knew. But this is merely advanced marketing. What happens when governments or potential employers can gather what seems like innocent and useless information (like grocery shopping preferences) to uncover your most intimate secrets – like health issues even you didn’t know about yet?

It turns out, all of the researchers I spoke to strongly agree with the Personal Privacy Principle.

### The Importance of Personal Privacy

“I think that’s a big immediate issue,” says Stefano Ermon, an assistant professor at Stanford. “I think when the general public thinks about AI safety, maybe they think about killer robots or these kind of apocalyptic scenarios, but there are big concrete issues like privacy, fairness, and accountability.”

“I support that principle very strongly!” agrees Dan Weld, a professor at the University of Washington. “I’m really quite worried about the loss of privacy. The number of sensors is increasing and combined with advanced machine learning, there are few limits to what companies and governments can learn about us. Now is the time to insist on the ability to control our own data.”

Toby Walsh, a guest professor at the Technical University of Berlin, also worries about privacy. “Yes, this is a great one, and actually I’m really surprised how little discussion we have around AI and privacy,” says Walsh. “I thought there was going to be much more fallout from Snowden and some of the revelations that happened, and AI, of course, is enabling technology. If you’re collecting all of this data, the only way to make sense of it is to use AI, so I’ve been surprised that there hasn’t been more discussion and more concern amongst the public around these sorts of issues.”

Kay Firth-Butterfield, an adjunct professor at the University of Texas in Austin, adds, “As AI becomes more powerful, we need to take steps to ensure that it cannot use our personal data against us if it falls into the wrong hands.”

Taking this concern a step further, Roman Yampolskiy, an associate professor at the University of Louisville, argues that “the world’s dictatorships are looking forward to opportunities to target their citizenry with extreme levels of precision.”

“The tech we will develop,” he continues, “will most certainly become available throughout the world and so we have a responsibility to make privacy a fundamental cornerstone of any data analysis.”

But some of the researchers also worry about the money to be made from personal data.

Ermon explains, “Privacy is definitely a big one, and one of the most valuable things that these large corporations have is the data they are collecting from us, so we should think about that carefully.”

“Data is worth money,” agrees Firth-Butterfield, “and as individuals we should be able to choose when and how to monetize our own data whilst being encouraged to share data for public health and other benefits.”

Francesca Rossi, a research scientist for IBM, believes this principle is “very important,” but she also emphasizes the benefits we can gain if we can share our data without fearing it will be misused. She says, “People should really have the right to own their privacy, and companies like IBM or any other that provide AI capabilities and systems should protect the data of their clients. The quality and amount of data is essential for many AI systems to work well, especially in machine learning. … It’s also very important that these companies don’t just assure that they are taking care of the data, but that they are transparent about the use of the data. Without this transparency and trust, people will resist giving their data, which would be detrimental to the AI capabilities and the help AI can offer in solving their health problems, or whatever the AI is designed to solve.”

### Privacy as as Social Right

Both Yoshua Bengio and Guruduth Banavar argued that personal privacy isn’t just something that AI researchers should value, but that it should also be considered a social right.

Bengio, a professor at the University of Montreal, says, “We should be careful that the complexity of AI systems doesn’t become a tool for abusing minorities or individuals who don’t have access to understand how it works. I think this is a serious social rights issue.” But he also worries that preventing rights violations may not be an easy technical fix. “We have to be careful with that because we may end up barring machine learning from publicly used systems, if we’re not careful,” he explains, adding, “the solution may not be as simple as saying ‘it has to be explainable,’ because it won’t be.”

And as Ermon says, “The more we delegate decisions to AI systems, the more we’re going to run into these issues.”

Meanwhile, Banavar, the Vice President of IBM Research, considers the issue of personal privacy rights especially important. He argues, “It’s absolutely crucial that individuals should have the right to manage access to the data they generate. … AI does open new insight to individuals and institutions. It creates a persona for the individual or institution – personality traits, emotional make-up, lots of the things we learn when we meet each other. AI will do that too and it’s very personal. I want to control how [my] persona is created. A persona is a fundamental right.”

### What Do You Think?

And now we turn the conversation over to you. What does personal privacy mean to you? How important is it to have control over your data? The experts above may have agreed about how serious the problem of personal privacy is, but solutions are harder come by. Do we need to enact new laws to protect the public? Do we need new corporate policies? How can we ensure that companies and governments aren’t using our data for nefarious purposes – or even for well-intentioned purposes that still aren’t what we want? What else should we, as a society, be asking?

## How Do We Align Artificial Intelligence with Human Values?

A major change is coming, over unknown timescales but across every segment of society, and the people playing a part in that transition have a huge responsibility and opportunity to shape it for the best. What will trigger this change? Artificial intelligence.

Recently, some of the top minds in AI and related fields got together to discuss how we can ensure AI remains beneficial throughout this transition, and the result was the Asilomar AI Principles document. The intent of these 23 principles is to offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.”

The Principles represent the beginning of a conversation, and now that the conversation is underway, we need to follow up with broad discussion about each individual principle. The Principles will mean different things to different people, and in order to benefit as much of society as possible, we need to think about each principle individually.

As part of this effort, I interviewed many of the AI researchers who signed the Principles document to learn their take on why they signed and what issues still confront us.

### Value Alignment

Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation.

Stuart Russell, who helped pioneer the idea of value alignment, likes to compare this to the King Midas story. When King Midas asked for everything he touched to turn to gold, he really just wanted to be rich. He didn’t actually want his food and loved ones to turn to gold. We face a similar situation with artificial intelligence: how do we ensure that an AI will do what we really want, while not harming humans in a misguided attempt to do what its designer requested?

“Robots aren’t going to try to revolt against humanity,” explains Anca Dragan, an assistant professor and colleague of Russell’s at UC Berkeley, “they’ll just try to optimize whatever we tell them to do. So we need to make sure to tell them to optimize for the world we actually want.”

### What Do We Want?

Understanding what “we” want is among the biggest challenges facing AI researchers.

“The issue, of course, is to define what exactly these values are, because people might have different cultures, [come from] different parts of the world, [have] different socioeconomic backgrounds — I think people will have very different opinions on what those values are. And so that’s really the challenge,” says Stefano Ermon, an assistant professor at Stanford.

Roman Yampolskiy, an associate professor at the University of Louisville agrees. He explains, “It is very difficult to encode human values in a programming language, but the problem is made more difficult by the fact that we as humanity do not agree on common values, and even parts we do agree on change with time.”

And while some values are hard to gain consensus around, there are also lots of values we all implicitly agree on. As Russell notes, any human understands emotional and sentimental values that they’ve been socialized with, but it’s difficult to guarantee that a robot will be programmed with that same understanding.

But IBM research scientist Francesca Rossi is hopeful. As Rossi points out, “there is scientific research that can be undertaken to actually understand how to go from these values that we all agree on to embedding them into the AI system that’s working with humans.”

Dragan’s research comes at the problem from a different direction. Instead of trying to understand people, she looks at trying to train a robot or AI to be flexible with its goals as it interacts with people. She explains, “At Berkeley, … we think it’s important for agents to have uncertainty about their objectives, rather than assuming they are perfectly specified, and treat human input as valuable observations about the true underlying desired objective.”

### Rewrite the Principle?

While most researchers agree with the underlying idea of the Value Alignment Principle, not everyone agrees with how it’s phrased, let alone how to implement it.

Yoshua Bengio, an AI pioneer and professor at the University of Montreal, suggests “assured” may be too strong. He explains, “It may not be possible to be completely aligned. There are a lot of things that are innate, which we won’t be able to get by machine learning, and that may be difficult to get by philosophy or introspection, so it’s not totally clear we’ll be able to perfectly align. I think the wording should be something along the lines of ‘we’ll do our best.’ Otherwise, I totally agree.”

Walsh, who’s currently a guest professor at the Technical University of Berlin, questions the use of the word “highly.” “I think any autonomous system, even a lowly autonomous system, should be aligned with human values. I’d wordsmith away the ‘high,’” he says.

Walsh also points out that, while value alignment is often considered an issue that will arise in the future, he believes it’s something that needs to be addressed sooner rather than later. “I think that we have to worry about enforcing that principle today,” he explains. “I think that will be helpful in solving the more challenging value alignment problem as systems get more sophisticated.

Rossi, who supports the the Value Alignment Principle as, “the one closest to my heart,” agrees that the principle should apply to current AI systems. “I would be even more general than what you’ve written in this principle,” she says. “Because this principle has to do not only with autonomous AI systems, but … is very important and essential also for systems that work tightly with humans-in-the-loop and where the human is the final decision maker. When you have a human and machine tightly working together, you want this to be a real team.”

But as Dragan explains, “This is one step toward helping AI figure out what it should do, and continuously refining the goals should be an ongoing process between humans and AI.”

### Let the Dialogue Begin

And now we turn the conversation over to you. What does it mean to you to have artificial intelligence aligned with your own life goals and aspirations? How can it be aligned with you and everyone else in the world at the same time? How do we ensure that one person’s version of an ideal AI doesn’t make your life more difficult? How do we go about agreeing on human values, and how can we ensure that AI understands these values? If you have a personal AI assistant, how should it be programmed to behave? If we have AI more involved in things like medicine or policing or education, what should that look like? What else should we, as a society, be asking?

## Podcast: Top AI Breakthroughs, with Ian Goodfellow and Richard Mallah

2016 saw some significant AI developments. To talk about the AI progress of the last year, we turned to Richard Mallah and Ian Goodfellow. Richard is the director of AI projects at FLI, he’s the Senior Advisor to multiple AI companies, and he created the highest-rated enterprise text analytics platform. Ian is a research scientist at OpenAI, he’s the lead author of the Deep Learning textbook, and he’s a lead inventor of Generative Adversarial Networks.

The following interview has been heavily edited for brevity, but you can listen to it in its entirety above or read the full transcript here.

Ariel: Two events stood out to me in 2016. The first was AlphaGo, which beat the world’s top Go champion, Lee Sedol last March. What is AlphaGo, and why was this such an incredible achievement?

Ian: AlphaGo was DeepMind’s system for playing the game of Go. It’s a game where you place stones on a board with two players, the object being to capture as much territory as possible. But there are hundreds of different positions where we can place a stone on each turn. It’s not even remotely possible to use a computer to simulate many different Go games and figure out how the game will progress in the future. The computer needs to rely on intuition the same way that human Go players can look at a board and get kind of a sixth sense that tells them whether the game is going well or poorly for them, and where they ought to put the next stone. It’s computationally infeasible to explicitly calculate what each player should do next.

Richard: The DeepMind team has one network for what’s called value learning and another deep network for policy learning. The policy is, basically, which places should I evaluate for the next piece. The value network is how good that state is, in terms of the probability that the agent will be winning. And then they do a Monte Carlo tree search, which means it has some randomness and many different paths — on the order of thousands of evaluations. So it’s much more like a human considering a handful of different moves and trying to determine how good those moves would be.

Ian: From 2012 to 2015 we saw a lot of breakthroughs where the exciting thing was that AI was able to copy a human ability. In 2016, we started to see breakthroughs that were all about exceeding human performance. Part of what was so exciting about AlphaGo was that AlphaGo did not only learn how to predict what a human expert Go player would do, AlphaGo also improved beyond that by practicing playing games against itself and learning how to be better than the best human player. So we’re starting to see AI move beyond what humans can tell the computer to do.

Ariel: So how will this be applied to applications that we’ll interact with on a regular basis? How will we start to see these technologies and techniques in action ourselves?

Richard: With these techniques, a lot of them are research systems. It’s not necessarily that they’re going to directly go down the pipeline towards productization, but they are helping the models that are implicitly learned inside of AI systems and machine learning systems to get much better.

Ian: There are other strategies for generating new experiences that resemble previously seen experiences. One of them is called WaveNet. It’s a model produced by DeepMind in 2016 for generating speech. If you provide a sentence, just written down, and you’d like to hear that sentence spoken aloud, WaveNet can create an audio waveform that sounds very realistically like a human pronouncing that sentence written down. The main drawback to WaveNet right now is that it’s fairly slow. It has to generate the audio waveform one piece at a time. I believe it takes WaveNet two minutes to produce one second of audio, so it’s not able to make the audio fast enough to hold an interactive conversation.

Richard: And similarly, we’ve seen applications to colorizing black and white photos, or turning sketches into somewhat photo-realistic images, being able to turn text into images.

Ian: Yeah one thing that really highlights how far we’ve come is that in 2014, one of the big breakthroughs was the ability to take a photo and produce a sentence summarizing what was in the photo. In 2016, we saw different methods for taking a sentence and producing a photo that contains the imagery described by the sentence. It’s much more complicated to go from a few words to a very realistic image containing thousands or millions of pixels than it is to go from the image to the words.

Another thing that was very exciting in 2016 was the use of generative models for drug discovery. Instead of imagining new images, the model could actually imagine new molecules that are intended to have specific medicinal effects.

Richard: And this is pretty exciting because this is being applied towards cancer research, developing potential new cancer treatments.

Ariel: And then there was Google’s language translation program, Google Neural Machine Translation. Can you talk about what that did and why it was a big deal?

Ian: It’s a big deal for two different reasons. First, Google Neural Machine Translation is a lot better than previous approaches to machine translation. Google Neural Machine Translation removes a lot of the human design elements, and just has a neural network figure out what to do.

The other thing that’s really exciting about Google Neural Machine Translation is that the machine translation models have developed what we call an “Interlingua.” It used to be that if you wanted to translate from Japanese to Korean, you had to find a lot of sentences that had been translated from Japanese to Korean before, and then you could train a machine learning model to copy that translation procedure. But now, if you already know how to translate from English to Korean, and you know how to translate from English to Japanese, in the middle, you have Interlingua. So you translate from English to Interlingua and then to Japanese, English to Interlingua and then to Korean. You can also just translate Japanese to Interlingua and Korean to Interlingua and then Interlingua to Japanese or Korean, and you never actually have to get translated sentences from every pair of languages.

Ariel: How can the techniques that are used for language apply elsewhere? How do you anticipate seeing this developed in 2017 and onward?

Richard: So I think what we’ve learned from the approach is that deep learning systems are able to create extremely rich models of the world that can actually express what we can think, which is a pretty exciting milestone. Being able to combine that Interlingua with more structured information about the world is something that a variety of teams are working on — it is a big, open area for the coming years.

Ian: At OpenAI one of our largest projects, Universe, allows a reinforcement learning agent to play many different computer games, and it interacts with these games in the same way that a human does, by sending key presses or mouse strokes to the actual game engine. The same reinforcement learning agent is able to interact with basically anything that a human can interact with on a computer. By having one agent that can do all of these different things we will really exercise our ability to create general artificial intelligence instead of application-specific artificial intelligence. And projects like Google’s Interlingua have shown us that there’s a lot of reason to believe that this will work.

Ariel: What else happened this year that you guys think is important to mention?

Richard: One-shot [learning] is when you see just a little bit of data, potentially just one data point, regarding some new task or some new category, and you’re then able to deduce what that class should look like or what that function should look like in general. So being able to train systems on very little data from just general background knowledge, will be pretty exciting.

Ian: One thing that I’m very excited about is this new area called machine learning security where an attacker can trick a machine learning system into taking the wrong action. For example, we’ve seen that it’s very easy to fool an object-recognition system. We can show it an image that looks a lot like a panda and it gets recognized as being a school bus, or vice versa. It’s actually possible to fool machine learning systems with physical objects. There was a paper called Accessorize to a Crime, that showed that by wearing unusually-colored glasses it’s possible to thwart a face recognition system. And my own collaborators at GoogleBrain and I wrote a paper called Adversarial Examples in the Physical World, where we showed that we can make images that look kind of grainy and noisy, but when viewed through a camera we can control how an object-recognition system will respond to those images.

Ariel: Is there anything else that you thought was either important for 2016 or looking forward to 2017?

Richard: Yeah, looking forward to 2017 I think there will be more focus on unsupervised learning. Most of the world is not annotated by humans. There aren’t little sticky notes on things around the house saying what they are. Being able to process [the world] in a more unsupervised way will unlock a plethora of new applications.

Ian: It will also make AI more democratic. Right now, if you want to use really advanced AI you need to have not only a lot of computers but also a lot of data. That’s part of why it’s mostly very large companies that are competitive in the AI space. If you want to get really good at a task you basically become good at that task by showing the computer a million different examples. In the future, we’ll have AI that can learn much more like a human learns, where just showing it a few examples is enough. Once we have machine learning systems that are able to get the general idea of what’s going on very quickly, in the way that humans do, it won’t be necessary to build these gigantic data sets anymore.

Richard: One application area I think will be important this coming year is automatic detection of fake news, fake audio and fake images and fake video. Some of the applications this past year have actually focused on generating additional frames of video. As those get better, as the photo generation that we talked about earlier gets better, and also as audio templating gets better… I think it was Adobe that demoed what they called PhotoShop for Voice, where you can type something in and select a person, and it will sound like that person saying whatever it is that you typed. So we’ll need ways of detecting that, since this whole concept of fake news is quite at the fore these days.

Ian: It’s worth mentioning that there are other ways of addressing the spread of fake news. Email spam uses a lot of different clues that it can statistically associate with whether people mark the email as spam or not. We can do a lot without needing to advance the underlying AI systems at all.

Ariel: Is there anything that you’re worried about, based on advances that you’ve seen in the last year?

Ian: The employment issue. As we’re able to automate our tasks in the future, how will we make sure that everyone benefits from that automation? And the way that society is structured, right now increasing automation seems to lead to increasing concentration of wealth, and there are winners and losers to every advance. My concern is that automating jobs that are done by millions of people will create very many losers and a small number of winners who really win big.

Richard: I’m also slightly concerned with the speed at which we’re approaching additional generality. It’s extremely cool to see systems be able to do lots of different things, and being able to do tasks that they’ve either seen very little of or none of before. But it raises questions as to when we implement different types of safety techniques. I don’t think that we’re at that point yet, but it raises the issue.

Ariel: To end on a positive note: looking back on what you saw last year, what has you most hopeful for our future?

Ian: I think it’s really great that AI is starting to be used for things like medicine. In the last year we’ve seen a lot of different machine learning algorithms that could exceed human abilities at some tasks, and we’ve also started to see the application of AI to life-saving application areas like designing new medicines. And this makes me very hopeful that we’re going to start seeing superhuman drug design, and other kinds of applications of AI to just really make life better for a lot of people in ways that we would not have been able to do without it.

Richard: Various kinds of tasks that people find to be drudgery within their jobs will be automatable. That will lead them to be open to working on more value-added things with more creativity, and potentially be able to work in more interesting areas of their field or across different fields. I think the future is wide open and it’s really what we make of it, which is exciting in itself.

## Why 2016 Was Actually a Year of Hope

Just about everyone found something to dislike about 2016, from wars to politics and celebrity deaths. But hidden within this year’s news feeds were some really exciting news stories. And some of them can even give us hope for the future.

## Artificial Intelligence

Though concerns about the future of AI still loom, 2016 was a great reminder that, when harnessed for good, AI can help humanity thrive.

#### AI and Health

Some of the most promising and hopefully more immediate breakthroughs and announcements were related to health. Google’s DeepMind announced a new division that would focus on helping doctors improve patient care. Harvard Business Review considered what an AI-enabled hospital might look like, which would improve the hospital experience for the patient, the doctor, and even the patient’s visitors and loved ones. A breakthrough from MIT researchers could see AI used to more quickly and effectively design new drug compounds that could be applied to a range of health needs.

More specifically, Microsoft wants to cure cancer, and the company has been working with research labs and doctors around the country to use AI to improve cancer research and treatment. But Microsoft isn’t the only company that hopes to cure cancer. DeepMind Health also partnered with University College London’s hospitals to apply machine learning to diagnose and treat head and neck cancers.

#### AI and Society

Other researchers are turning to AI to help solve social issues. While AI has what is known as the “white guy problem” and examples of bias cropped up in many news articles, Fei Fei Li has been working with STEM girls at Stanford to bridge the gender gap. Stanford researchers also published research that suggests  artificial intelligence could help us use satellite data to combat global poverty.

It was also a big year for research on how to keep artificial intelligence safe as it continues to develop. Google and the Future of Humanity Institute made big headlines with their work to design a “kill switch” for AI. Google Brain also published a research agenda on various problems AI researchers should be studying now to help ensure safe AI for the future.

Even the White House got involved in AI this year, hosting four symposia on AI and releasing reports in October and December about the potential impact of AI and the necessary areas of research. The White House reports are especially focused on the possible impact of automation on the economy, but they also look at how the government can contribute to AI safety, especially in the near future.

#### AI in Action

And of course there was AlphaGo. In January, Google’s DeepMind published a paper, which announced that the company had created a program, AlphaGo, that could beat one of Europe’s top Go players. Then, in March, in front of a live audience, AlphaGo beat the reigning world champion of Go in four out of five games. These results took the AI community by surprise and indicate that artificial intelligence may be progressing more rapidly than many in the field realized.

And AI went beyond research labs this year to be applied practically and beneficially in the real world. Perhaps most hopeful was some of the news that came out about the ways AI has been used to address issues connected with pollution and climate change. For example, IBM has had increasing success with a program that can forecast pollution in China, giving residents advanced warning about days of especially bad air. Meanwhile, Google was able to reduce its power usage by using DeepMind’s AI to manipulate things like its cooling systems.

And speaking of addressing climate change…

## Climate Change

With recent news from climate scientists indicating that climate change may be coming on faster and stronger than previously anticipated and with limited political action on the issue, 2016 may not have made climate activists happy. But even here, there was some hopeful news.

Among the biggest news was the ratification of the Paris Climate Agreement. But more generally, countries, communities and businesses came together on various issues of global warming, and Voices of America offers five examples of how this was a year of incredible, global progress.

But there was also news of technological advancements that could soon help us address climate issues more effectively. Scientists at Oak Ridge National Laboratory have discovered a way to convert CO2 into ethanol. A researcher from UC Berkeley has developed a method for artificial photosynthesis, which could help us more effectively harness the energy of the sun. And a multi-disciplinary team has genetically engineered bacteria that could be used to help combat global warming.

## Biotechnology

Biotechnology — with fears of designer babies and manmade pandemics – is easily one of most feared technologies. But rather than causing harm, the latest biotech advances could help to save millions of people.

#### CRISPR

In the course of about two years, CRISPR-cas9 went from a new development to what could become one of the world’s greatest advances in biology. Results of studies early in the year were promising, but as the year progressed, the news just got better. CRISPR was used to successfully remove HIV from human immune cells. A team in China used CRISPR on a patient for the first time in an attempt to treat lung cancer (treatments are still ongoing), and researchers in the US have also received approval to test CRISPR cancer treatment in patients. And CRISPR was also used to partially restore sight to blind animals.

#### Gene Drive

Where CRISPR could have the most dramatic, life-saving effect is in gene drives. By using CRISPR to modify the genes of an invasive species, we could potentially eliminate the unwelcome plant or animal, reviving the local ecology and saving native species that may be on the brink of extinction. But perhaps most impressive is the hope that gene drive technology could be used to end mosquito- and tick-borne diseases, such as malaria, dengue, Lyme, etc. Eliminating these diseases could easily save over a million lives every year.

#### Other Biotech News

The year saw other biotech advances as well. Researchers at MIT addressed a major problem in synthetic biology in which engineered genetic circuits interfere with each other. Another team at MIT engineered an antimicrobial peptide that can eliminate many types of bacteria, including some of the antibiotic-resistant “superbugs.” And various groups are also using CRISPR to create new ways to fight antibiotic-resistant bacteria.

If ever there was a topic that does little to inspire hope, it’s nuclear weapons. Yet even here we saw some positive signs this year. The Cambridge City Council voted to divest their $1 billion pension fund from any companies connected with nuclear weapons, which earned them an official commendation from the U.S. Conference of Mayors. In fact, divestment may prove a useful tool for the general public to express their displeasure with nuclear policy, which will be good, since one cause for hope is that the growing awareness of the nuclear weapons situation will help stigmatize the new nuclear arms race. In February, Londoners held the largest anti-nuclear rally Britain had seen in decades, and the following month MinutePhysics posted a video about nuclear weapons that’s been seen by nearly 1.3 million people. In May, scientific and religious leaders came together to call for steps to reduce nuclear risks. And all of that pales in comparison to the attention the U.S. elections brought to the risks of nuclear weapons. As awareness of nuclear risks grows, so do our chances of instigating the change necessary to reduce those risks. ## The United Nations Takes on Weapons But if awareness alone isn’t enough, then recent actions by the United Nations may instead be a source of hope. As October came to a close, the United Nations voted to begin negotiations on a treaty that would ban nuclear weapons. While this might not have an immediate impact on nuclear weapons arsenals, the stigmatization caused by such a ban could increase pressure on countries and companies driving the new nuclear arms race. The U.N. also announced recently that it would officially begin looking into the possibility of a ban on lethal autonomous weapons, a cause that’s been championed by Elon Musk, Steve Wozniak, Stephen Hawking and thousands of AI researchers and roboticists in an open letter. ## Looking Ahead And why limit our hope and ambition to merely one planet? This year, a group of influential scientists led by Yuri Milner announced an Alpha-Centauri starshot, in which they would send a rocket of space probes to our nearest star system. Elon Musk later announced his plans to colonize Mars. And an MIT scientist wants to make all of these trips possible for humans by using CRISPR to reengineer our own genes to keep us safe in space. Yet for all of these exciting events and breakthroughs, perhaps what’s most inspiring and hopeful is that this represents only a tiny sampling of all of the amazing stories that made the news this year. If trends like these keep up, there’s plenty to look forward to in 2017. ## Podcast: FLI 2016 – A Year In Review For FLI, 2016 was a great year, full of our own success, but also great achievements from so many of the organizations we work with. Max, Meia, Anthony, Victoria, Richard, Lucas, David, and Ariel discuss what they were most excited to see in 2016 and what they’re looking forward to in 2017. AGUIRRE: I’m Anthony Aguirre. I am a professor of physics at UC Santa Cruz, and I’m one of the founders of the Future of Life Institute. STANLEY: I’m David Stanley, and I’m currently working with FLI as a Project Coordinator/Volunteer Coordinator. PERRY: My name is Lucas Perry, and I’m a Project Coordinator with the Future of Life Institute. TEGMARK: I’m Max Tegmark, and I have the fortune to be the President of the Future of Life Institute. CHITA-TEGMARK: I’m Meia Chita-Tegmark, and I am a co-founder of the Future of Life Institute. MALLAH: Hi, I’m Richard Mallah. I’m the Director of AI Projects at the Future of Life Institute. KRAKOVNA: Hi everyone, I am Victoria Krakovna, and I am one of the co-founders of FLI. I’ve recently taken up a position at Google DeepMind working on AI safety. CONN: And I’m Ariel Conn, the Director of Media and Communications for FLI. 2016 has certainly had its ups and downs, and so at FLI, we count ourselves especially lucky to have had such a successful year. We’ve continued to progress with the field of AI safety research, we’ve made incredible headway with our nuclear weapons efforts, and we’ve worked closely with many amazing groups and individuals. On that last note, much of what we’ve been most excited about throughout 2016 is the great work these other groups in our fields have also accomplished. Over the last couple of weeks, I’ve sat down with our founders and core team to rehash their highlights from 2016 and also to learn what they’re all most looking forward to as we move into 2017. To start things off, Max gave a summary of the work that FLI does and why 2016 was such a success. TEGMARK: What I was most excited by in 2016 was the overall sense that people are taking seriously this idea – that we really need to win this race between the growing power of our technology and the wisdom with which we manage it. Every single way in which 2016 is better than the Stone Age is because of technology, and I’m optimistic that we can create a fantastic future with tech as long as we win this race. But in the past, the way we’ve kept one step ahead is always by learning from mistakes. We invented fire, messed up a bunch of times, and then invented the fire extinguisher. We at the Future of Life Institute feel that that strategy of learning from mistakes is a terrible idea for more powerful tech, like nuclear weapons, artificial intelligence, and things that can really alter the climate of our globe. Now, in 2016 we saw multiple examples of people trying to plan ahead and to avoid problems with technology instead of just stumbling into them. In April, we had world leaders getting together and signing the Paris Climate Accords. In November, the United Nations General Assembly voted to start negotiations about nuclear weapons next year. The question is whether they should actually ultimately be phased out; whether the nations that don’t have nukes should work towards stigmatizing building more of them – with the idea that 14,000 is way more than anyone needs for deterrence. And – just the other day – the United Nations also decided to start negotiations on the possibility of banning lethal autonomous weapons, which is another arms race that could be very, very destabilizing. And if we keep this positive momentum, I think there’s really good hope that all of these technologies will end up having mainly beneficial uses. Today, we think of our biologist friends as mainly responsible for the fact that we live longer and healthier lives, and not as those guys who make the bioweapons. We think of chemists as providing us with better materials and new ways of making medicines, not as the people who built chemical weapons and are all responsible for global warming. We think of AI scientists as – I hope, when we look back on them in the future – as people who helped make the world better, rather than the ones who just brought on the AI arms race. And it’s very encouraging to me that as much as people in general – but also the scientists in all these fields – are really stepping up and saying, “Hey, we’re not just going to invent this technology, and then let it be misused. We’re going to take responsibility for making sure that the technology is used beneficially.” CONN: And beneficial AI is what FLI is primarily known for. So what did the other members have to say about AI safety in 2016? We’ll hear from Anthony first. AGUIRRE: I would say that what has been great to see over the last year or so is the AI safety and beneficiality research field really growing into an actual research field. When we ran our first conference a couple of years ago, they were these tiny communities who had been thinking about the impact of artificial intelligence in the future and in the long-term future. They weren’t really talking to each other; they weren’t really doing much actual research – there wasn’t funding for it. So, to see in the last few years that transform into something where it takes a massive effort to keep track of all the stuff that’s being done in this space now. All the papers that are coming out, the research groups – you sort of used to be able to just find them all, easily identified. Now, there’s this huge worldwide effort and long lists, and it’s difficult to keep track of. And that’s an awesome problem to have. As someone who’s not in the field, but sort of watching the dynamics of the research community, that’s what’s been so great to see. A research community that wasn’t there before really has started, and I think in the past year we’re seeing the actual results of that research start to come in. You know, it’s still early days. But it’s starting to come in, and we’re starting to see papers that have been basically created using these research talents and the funding that’s come through the Future of Life Institute. It’s been super gratifying. And seeing that it’s a fairly large amount of money – but fairly small compared to the total amount of research funding in artificial intelligence or other fields – but because it was so funding-starved and talent-starved before, it’s just made an enormous impact. And that’s been nice to see. CONN: Not surprisingly, Richard was equally excited to see AI safety becoming a field of ever-increasing interest for many AI groups. MALLAH: I’m most excited by the continued mainstreaming of AI safety research. There are more and more publications coming out by places like DeepMind and Google Brain that have really lent additional credibility to the space, as well as a continued uptake of more and more professors, and postdocs, and grad students from a wide variety of universities entering this space. And, of course, OpenAI has come out with a number of useful papers and resources. I’m also excited that governments have really realized that this is an important issue. So, while the White House reports have come out recently focusing more on near-term AI safety research, they did note that longer-term concerns like superintelligence are not necessarily unreasonable for later this century. And that they do support – right now – funding safety work that can scale toward the future, which is really exciting. We really need more funding coming into the community for that type of research. Likewise, other governments – like the U.K. and Japan, Germany – have all made very positive statements about AI safety in one form or another. And other governments around the world. CONN: In addition to seeing so many other groups get involved in AI safety, Victoria was also pleased to see FLI taking part in so many large AI conferences. KRAKOVNA: I think I’ve been pretty excited to see us involved in these AI safety workshops at major conferences. So on the one hand, our conference in Puerto Rico that we organized ourselves was very influential and helped to kick-start making AI safety more mainstream in the AI community. On the other hand, it felt really good in 2016 to complement that with having events that are actually part of major conferences that were co-organized by a lot of mainstream AI researchers. I think that really was an integral part of the mainstreaming of the field. For example, I was really excited about the Reliable Machine Learning workshop at ICML that we helped to make happen. I think that was something that was quite positively received at the conference, and there was a lot of good AI safety material there. CONN: And of course, Victoria was also pretty excited about some of the papers that were published this year connected to AI safety, many of which received at least partial funding from FLI. KRAKOVNA: There were several excellent papers in AI safety this year, addressing core problems in safety for machine learning systems. For example, there was a paper from Stuart Russell’s lab published at NIPS, on cooperative IRL. This is about teaching AI what humans want – how to train an RL algorithm to learn the right reward function that reflects what humans want it to do. DeepMind and FHI published a paper at UAI on safely interruptible agents, that formalizes what it means for an RL agent not to have incentives to avoid shutdown. MIRI made an impressive breakthrough with their paper on logical inductors. I’m super excited about all these great papers coming out, and that our grant program contributed to these results. CONN: For Meia, the excitement about AI safety went beyond just the technical aspects of artificial intelligence. CHITA-TEGMARK: I am very excited about the dialogue that FLI has catalyzed – and also engaged in – throughout 2016, and especially regarding the impact of technology on society. My training is in psychology; I’m a psychologist. So I’m very interested in the human aspect of technology development. I’m very excited about questions like, how are new technologies changing us? How ready are we to embrace new technologies? Or how our psychological biases may be clouding our judgement about what we’re creating and the technologies that we’re putting out there. Are these technologies beneficial for our psychological well-being, or are they not? So it has been extremely interesting for me to see that these questions are being asked more and more, especially by artificial intelligence developers and also researchers. I think it’s so exciting to be creating technologies that really force us to grapple with some of the most fundamental aspects, I would say, of our own psychological makeup. For example, our ethical values, our sense of purpose, our well-being, maybe our biases and shortsightedness and shortcomings as biological human beings. So I’m definitely very excited about how the conversation regarding technology – and especially artificial intelligence – has evolved over the last year. I like the way it has expanded to capture this human element, which I find so important. But I’m also so happy to feel that FLI has been an important contributor to this conversation. CONN: Meanwhile, as Max described earlier, FLI has also gotten much more involved in decreasing the risk of nuclear weapons, and Lucas helped spearhead one of our greatest accomplishments of the year. PERRY: One of the things that I was most excited about was our success with our divestment campaign. After a few months, we had great success in our own local Boston area with helping the City of Cambridge to divest its$1 billion portfolio from nuclear weapon producing companies. And we see this as a really big and important victory within our campaign to help institutions, persons, and universities to divest from nuclear weapons producing companies.

CONN: And in order to truly be effective we need to reach an international audience, which is something Dave has been happy to see grow this year.

STANLEY: I’m mainly excited about – at least, in my work – the increasing involvement and response we’ve had from the international community in terms of reaching out about these issues. I think it’s pretty important that we engage the international community more, and not just academics. Because these issues – things like nuclear weapons and the increasing capabilities of artificial intelligence – really will affect everybody. And they seem to be really underrepresented in mainstream media coverage as well.

So far, we’ve had pretty good responses just in terms of volunteers from many different countries around the world being interested in getting involved to help raise awareness in their respective communities, either through helping develop apps for us, or translation, or promoting just through social media these ideas in their little communities.

CONN: Many FLI members also participated in both local and global events and projects, like the following we’re about  to hear from Victoria, Richard, Lucas and Meia.

KRAKOVNA: The EAGX Oxford Conference was a fairly large conference. It was very well organized, and we had a panel there with Demis Hassabis, Nate Soares from MIRI, Murray Shanahan from Imperial, Toby Ord from FHI, and myself. I feel like overall, that conference did a good job of, for example, connecting the local EA community with the people at DeepMind, who are really thinking about AI safety concerns like Demis and also Sean Legassick, who also gave a talk about the ethics and impacts side of things. So I feel like that conference overall did a good job of connecting people who are thinking about these sorts of issues, which I think is always a great thing.

MALLAH: I was involved in this endeavor with IEEE regarding autonomy and ethics in autonomous systems, sort of representing FLI’s positions on things like autonomous weapons and long-term AI safety. One thing that came out this year – just a few days ago, actually, due to this work from IEEE – is that the UN actually took the report pretty seriously, and it may have influenced their decision to take up the issue of autonomous weapons formally next year. That’s kind of heartening.

PERRY: A few different things that I really enjoyed doing were giving a few different talks at Duke and Boston College, and a local effective altruism conference. I’m also really excited about all the progress we’re making on our nuclear divestment application. So this is an application that will allow anyone to search their mutual fund and see whether or not their mutual funds have direct or indirect holdings in nuclear weapons-producing companies.

CHITA-TEGMARK:  So, a wonderful moment for me was at the conference organized by Yann LeCun in New York at NYU, when Daniel Kahneman, one of my thinker-heroes, asked a very important question that really left the whole audience in silence. He asked, “Does this make you happy? Would AI make you happy? Would the development of a human-level artificial intelligence make you happy?” I think that was one of the defining moments, and I was very happy to participate in this conference.

Later on, David Chalmers, another one of my thinker-heroes – this time, not the psychologist but the philosopher – organized another conference, again at NYU, trying to bring philosophers into this very important conversation about the development of artificial intelligence. And again, I felt there too, that FLI was able to contribute and bring in this perspective of the social sciences on this issue.

CONN: Now, with 2016 coming to an end, it’s time to turn our sites to 2017, and FLI is excited for this new year to be even more productive and beneficial.

TEGMARK: We at the Future of Life Institute are planning to focus primarily on artificial intelligence, and on reducing the risk of accidental nuclear war in various ways. We’re kicking off by having an international conference on artificial intelligence, and then we want to continue throughout the year providing really high-quality and easily accessible information on all these key topics, to help inform on what happens with climate change, with nuclear weapons, with lethal autonomous weapons, and so on.

And looking ahead here, I think it’s important right now – especially since a lot of people are very stressed out about the political situation in the world, about terrorism, and so on – to not ignore the positive trends and the glimmers of hope we can see as well.

CONN: As optimistic as FLI members are about 2017, we’re all also especially hopeful and curious to see what will happen with continued AI safety research.

AGUIRRE: I would say I’m looking forward to seeing in the next year more of the research that comes out, and really sort of delving into it myself, and understanding how the field of artificial intelligence and artificial intelligence safety is developing. And I’m very interested in this from the forecast and prediction standpoint.

I’m interested in trying to draw some of the AI community into really understanding how artificial intelligence is unfolding – in the short term and the medium term – as a way to understand, how long do we have? Is it, you know, if it’s really infinity, then let’s not worry about that so much, and spend a little bit more on nuclear weapons and global warming and biotech, because those are definitely happening. If human-level AI were 8 years away… honestly, I think we should be freaking out right now. And most people don’t believe that, I think most people are in the middle it seems, of thirty years or fifty years or something, which feels kind of comfortable. Although it’s not that long, really, on the big scheme of things. But I think it’s quite important to know now, which is it? How fast are these things, how long do we really have to think about all of the issues that FLI has been thinking about in AI? How long do we have before most jobs in industry and manufacturing are replaceable by a robot being slotted in for a human? That may be 5 years, it may be fifteen… It’s probably not fifty years at all. And having a good forecast on those good short-term questions I think also tells us what sort of things we have to be thinking about now.

And I’m interested in seeing how this massive AI safety community that’s started develops. It’s amazing to see centers kind of popping up like mushrooms after a rain all over and thinking about artificial intelligence safety. This partnership on AI between Google and Facebook and a number of other large companies getting started. So to see how those different individual centers will develop and how they interact with each other. Is there an overall consensus on where things should go? Or is it a bunch of different organizations doing their own thing? Where will governments come in on all of this? I think it will be interesting times. So I look forward to seeing what happens, and I will reserve judgement in terms of my optimism.

KRAKOVNA: I’m really looking forward to AI safety becoming even more mainstream, and even more of the really good researchers in AI giving it serious thought. Something that happened in the past year that I was really excited about, that I think is also pointing in this direction, is the research agenda that came out of Google Brain called “Concrete Problems in AI Safety.” And I think I’m looking forward to more things like that happening, where AI safety becomes sufficiently mainstream that people who are working in AI just feel inspired to do things like that and just think from their own perspectives: what are the important problems to solve in AI safety? And work on them.

I’m a believer in the portfolio approach with regards to AI safety research, where I think we need a lot of different research teams approaching the problems from different angles and making different assumptions, and hopefully some of them will make the right assumption. I think we are really moving in the direction in terms of more people working on these problems, and coming up with different ideas. And I look forward to seeing more of that in 2017. I think FLI can also help continue to make this happen.

MALLAH: So, we’re in the process of fostering additional collaboration among people in the AI safety space. And we will have more announcements about this early next year. We’re also working on resources to help people better visualize and better understand the space of AI safety work, and the opportunities there and the work that has been done. Because it’s actually quite a lot.

I’m also pretty excited about fostering continued theoretical work and practical work in making AI more robust and beneficial. The work in value alignment, for instance, is not something we see supported in mainstream AI research. And this is something that is pretty crucial to the way that advanced AIs will need to function. It won’t be very explicit instructions to them; they’ll have to be making decision based on what they think is right. And what is right? It’s something that… or even structuring the way to think about what is right requires some more research.

STANLEY: We’ve had pretty good success at FLI in the past few years helping to legitimize the field of AI safety. And I think it’s going to be important because AI is playing a large role in industry and there’s a lot of companies working on this, and not just in the US. So I think increasing international awareness about AI safety is going to be really important.

CHITA-TEGMARK: I believe that the AI community has raised some very important questions in 2016 regarding the impact of AI on society. I feel like 2017 should be the year to make progress on these questions, and actually research them and have some answers to them. For this, I think we need more social scientists – among people from other disciplines – to join this effort of really systematically investigating what would be the optimal impact of AI on people. I hope that in 2017 we will have more research initiatives, that we will attempt to systematically study other burning questions regarding the impact of AI on society. Some examples are: how can we ensure the psychological well-being for people while AI creates lots of displacement on the job market as many people predict. How do we optimize engagement with technology, and withdrawal from it also? Will some people be left behind, like the elderly or the economically disadvantaged? How will this affect them, and how will this affect society at large?

What about withdrawal from technology? What about satisfying our need for privacy? Will we be able to do that, or is the price of having more and more customized technologies and more and more personalization of the technologies we engage with… will that mean that we will have no privacy anymore, or that our expectations of privacy will be very seriously violated? I think these are some very important questions that I would love to get some answers to. And my wish, and also my resolution, for 2017 is to see more progress on these questions, and to hopefully also be part of this work and answering them.

PERRY: In 2017 I’m very interested in pursuing the landscape of different policy and principle recommendations from different groups regarding artificial intelligence. I’m also looking forward to expanding out nuclear divestment campaign by trying to introduce divestment to new universities, institutions, communities, and cities.

CONN: In fact, some experts believe nuclear weapons pose a greater threat now than at any time during our history.

TEGMARK: I personally feel that the greatest threat to the world in 2017 is one that the newspapers almost never write about. It’s not terrorist attacks, for example. It’s the small but horrible risk that the U.S. and Russia for some stupid reason get into an accidental nuclear war against each other. We have 14,000 nuclear weapons, and this war has almost happened many, many times. So, actually what’s quite remarkable and really gives a glimmer of hope is that – however people may feel about Putin and Trump – the fact is they are both signaling strongly that they are eager to get along better. And if that actually pans out and they manage to make some serious progress in nuclear arms reduction, that would make 2017 the best year for nuclear weapons we’ve had in a long, long time, reversing this trend of ever greater risks with ever more lethal weapons.

CONN: Some FLI members are also looking beyond nuclear weapons and artificial intelligence, as I learned when I asked Dave about other goals he hopes to accomplish with FLI this year.

STANLEY: Definitely having the volunteer team – particularly the international volunteers – continue to grow, and then scale things up. Right now, we have a fairly committed core of people who are helping out, and we think that they can start recruiting more people to help out in their little communities, and really making this stuff accessible. Not just to academics, but to everybody. And that’s also reflected in the types of people we have working for us as volunteers. They’re not just academics. We have programmers, linguists, people having just high school degrees all the way up to Ph.D.’s, so I think it’s pretty good that this varied group of people can get involved and contribute, and also reach out to other people they can relate to.

CONN: In addition to getting more people involved, Meia also pointed out that one of the best ways we can help ensure a positive future is to continue to offer people more informative content.

CHITA-TEGMARK: Another thing that I’m very excited about regarding our work here at the Future of Life Institute is this mission of empowering people to information. I think information is very powerful and can change the way people approach things: they can change their beliefs, their attitudes, and their behaviors as well. And by creating ways in which information can be readily distributed to the people, and with which they can engage very easily, I hope that we can create changes. For example, we’ve had a series of different apps regarding nuclear weapons that I think have contributed a lot to peoples knowledge and has brought this issue to the forefront of their thinking.

CONN: Yet as important as it is to highlight the existential risks we must address to keep humanity safe, perhaps it’s equally important to draw attention to the incredible hope we have for the future if we can solve these problems. Which is something both Richard and Lucas brought up for 2017.

MALLAH: I’m excited about trying to foster more positive visions of the future, so focusing on existential hope aspects of the future. Which are kind of the flip side of existential risks. So we’re looking at various ways of getting people to be creative about understanding some of the possibilities, and how to differentiate the paths between the risks and the benefits.

PERRY: Yeah, I’m also interested in creating and generating a lot more content that has to do with existential hope. Given the current global political climate, it’s all the more important to focus on how we can make the world better.

CONN: And on that note, I want to mention one of the most amazing things I discovered this past year. It had nothing to do with technology, and everything to do with people. Since starting at FLI, I’ve met countless individuals who are dedicating their lives to trying to make the world a better place. We may have a lot of problems to solve, but with so many groups focusing solely on solving them, I’m far more hopeful for the future. There are truly too many individuals that I’ve met this year to name them all, so instead, I’d like to provide a rather long list of groups and organizations I’ve had the pleasure to work with this year. A link to each group can be found at futureoflife.org/2016, and I encourage you to visit them all to learn more about the wonderful work they’re doing. In no particular order, they are:

Machine Intelligence Research Institute

Future of Humanity Institute

Global Catastrophic Risk Institute

Center for the Study of Existential Risk

Ploughshares Fund

Bulletin of Atomic Scientists

Open Philanthropy Project

Union of Concerned Scientists

The William Perry Project

ReThink Media

Don’t Bank on the Bomb

Federation of American Scientists

Massachusetts Peace Action

IEEE (Institute for Electrical and Electronics Engineers)

Center for Human-Compatible Artificial Intelligence

Center for Effective Altruism

Center for Applied Rationality

Foresight Institute

Leverhulme Center for the Future of Intelligence

Global Priorities Project

Association for the Advancement of Artificial Intelligence

International Joint Conference on Artificial Intelligence

Partnership on AI

The White House Office of Science and Technology Policy

The Future Society at Harvard Kennedy School

I couldn’t be more excited to see what 2017 holds in store for us, and all of us at FLI look forward to doing all we can to help create a safe and beneficial future for everyone. But to end on an even more optimistic note, I turn back to Max.

TEGMARK: Finally, I’d like – because I spend a lot of my time thinking about our universe – to remind everybody that we shouldn’t just be focused on the next election cycle. We have not decades, but billions of years of potentially awesome future for life, on Earth and far beyond. And it’s so important to not let ourselves get so distracted by our everyday little frustrations that we lose sight of these incredible opportunities that we all stand to gain from if we can get along, and focus, and collaborate, and use technology for good.

## AI Safety Highlights from NIPS 2016

This year’s Neural Information Processing Systems (NIPS) conference was larger than ever, with almost 6000 people attending, hosted in a huge convention center in Barcelona, Spain. The conference started off with two exciting announcements on open-sourcing collections of environments for training and testing general AI capabilities – the DeepMind Lab and the OpenAI Universe. Among other things, this is promising for testing safety properties of ML algorithms. OpenAI has already used their Universe environment to give an entertaining and instructive demonstration of reward hacking that illustrates the challenge of designing robust reward functions.

I was happy to see a lot of AI-safety-related content at NIPS this year. The ML and the Law symposium and Interpretable ML for Complex Systems workshop focused on near-term AI safety issues, while the Reliable ML in the Wild workshop also covered long-term problems. Here are some papers relevant to long-term AI safety:

#### Inverse Reinforcement Learning

Cooperative Inverse Reinforcement Learning (CIRL) by Hadfield-Menell, Russell, Abbeel, and Dragan (main conference). This paper addresses the value alignment problem by teaching the artificial agent about the human’s reward function, using instructive demonstrations rather than optimal demonstrations like in classical IRL (e.g. showing the robot how to make coffee vs having it observe coffee being made). (3-minute video)

Generalizing Skills with Semi-Supervised Reinforcement Learning by Finn, Yu, Fu, Abbeel, and Levine (Deep RL workshop). This work addresses the scalable oversight problem by proposing the first tractable algorithm for semi-supervised RL. This allows artificial agents to robustly learn reward functions from limited human feedback. The algorithm uses an IRL-like approach to infer the reward function, using the agent’s own prior experiences in the supervised setting as an expert demonstration.

Towards Interactive Inverse Reinforcement Learning by Armstrong and Leike (Reliable ML workshop). This paper studies the incentives of an agent that is trying to learn about the reward function while simultaneously maximizing the reward. The authors discuss some ways to reduce the agent’s incentive to manipulate the reward learning process.

Should Robots Have Off Switches? by Milli, Hadfield-Menell, and Russell (Reliable ML workshop). This poster examines some adverse effects of incentivizing artificial agents to be compliant in the off-switch game (a variant of CIRL).

#### Safe Exploration

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes by Turchetta, Berkenkamp, and Krause (main conference). This paper develops a reinforcement learning algorithm called Safe MDP that can explore an unknown environment without getting into irreversible situations, unlike classical RL approaches.

Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear by Lipton, Gao, Li, Chen, and Deng (Reliable ML workshop). This work addresses the ‘Sisyphean curse’ of DQN algorithms forgetting past experiences, as they become increasingly unlikely under a new policy, and therefore eventually repeating catastrophic mistakes. The paper introduces an approach called ‘intrinsic fear’, which maintains a model for how likely different states are to lead to a catastrophe within some number of steps.

Most of these papers were related to inverse reinforcement learning – while IRL is a promising approach, it would be great to see more varied safety material at the next NIPS. There were some more safety papers on other topics at UAI this summer: Safely Interruptible Agents (formalizing what it means to incentivize an agent to obey shutdown signals) and A Formal Solution to the Grain of Truth Problem (providing a broad theoretical framework for multiple agents learning to predict each other in arbitrary computable games).

These highlights were originally posted here and cross-posted to Approximately Correct. Thanks to Jan Leike, Zachary Lipton, and Janos Kramar for providing feedback on this post.

We’re in the final weeks of our push to cover our funding shortfall, and we’re now halfway to our \$160,000 goal. For potential donors who are interested in an outside perspective, Future of Humanity Institute (FHI) researcher Owen Cotton-Barratt has written up why he’s donating to MIRI this year. (Donation page.)Research updates

• We teamed up with a number of AI safety researchers to help compile a list of recommended AI safety readings for the Center for Human-Compatible AI. See this page if you would like to get involved with CHCAI’s research.
• Investment analyst Ben Hoskin reviews MIRI and other organizations involved in AI safety.

• The Off-Switch Game“: Dylan Hadfield-Manell, Anca Dragan, Pieter Abbeel, and Stuart Russell show that an AI agent’s corrigibility is closely tied to the uncertainty it has about its utility function.
• Russell and Allan Dafoe critique an inaccurate summary by Oren Etzioni of a new survey of AI experts on superintelligence.
• Sam Harris interviews Russell on the basics of AI risk (video). See also Russell’s new Q&A on the future of AI.
• Future of Life Institute co-founder Viktoriya Krakovna and FHI researcher Jan Leike join Google DeepMind’s safety team.
• GoodAI sponsors a challenge to “accelerate the search for general artificial intelligence”.
• OpenAI releases Universe, “a software platform for measuring and training an AI’s general intelligence across the world’s supply of games”. Meanwhile, DeepMind has open-sourced their own platform for general AI research, DeepMind Lab.
• Staff at GiveWell and the Centre for Effective Altruism, along with others in the effective altruism community, explain where they’re donating this year.
• FHI is seeking AI safety interns, researchers, and admins: jobs page.

This newsletter was originally posted here.

## Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.

### Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.

### Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”

### Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.