Cognitive Biases and AI Value Alignment: An Interview with Owain Evans

At the core of AI safety, lies the value alignment problem: how can we teach artificial intelligence systems to act in accordance with human goals and values?

Many researchers interact with AI systems to teach them human values, using techniques like inverse reinforcement learning (IRL). In theory, with IRL, an AI system can learn what humans value and how to best assist them by observing human behavior and receiving human feedback.

But human behavior doesn’t always reflect human values, and human feedback is often biased. We say we want healthy food when we’re relaxed, but then we demand greasy food when we’re stressed. Not only do we often fail to live according to our values, but many of our values contradict each other. We value getting eight hours of sleep, for example, but we regularly sleep less because we also value working hard, caring for our children, and maintaining healthy relationships.

AI systems may be able to learn a lot by observing humans, but because of our inconsistencies, some researchers worry that systems trained with IRL will be fundamentally unable to distinguish between value-aligned and misaligned behavior. This could become especially dangerous as AI systems become more powerful: inferring the wrong values or goals from observing humans could lead these systems to adopt harmful behavior.

 

Distinguishing Biases and Values

Owain Evans, a researcher at the Future of Humanity Institute, and Andreas Stuhlmüller, president of the research non-profit Ought, have explored the limitations of IRL in teaching human values to AI systems. In particular, their research exposes how cognitive biases make it difficult for AIs to learn human preferences through interactive learning.

Evans elaborates: “We want an agent to pursue some set of goals, and we want that set of goals to coincide with human goals. The question then is, if the agent just gets to watch humans and try to work out their goals from their behavior, how much are biases a problem there?”

In some cases, AIs will be able to understand patterns of common biases. Evans and Stuhlmüller discuss the psychological literature on biases in their paper, Learning the Preferences of Ignorant, Inconsistent Agents, and in their online book, agentmodels.org. An example of a common pattern discussed in agentmodels.org is “time inconsistency.” Time inconsistency is the idea that people’s values and goals change depending on when you ask them. In other words, “there is an inconsistency between what you prefer your future self to do and what your future self prefers to do.”

Examples of time inconsistency are everywhere. For one, most people value waking up early and exercising if you ask them before bed. But come morning, when it’s cold and dark out and they didn’t get those eight hours of sleep, they often value the comfort of their sheets and the virtues of relaxation. From waking up early to avoiding alcohol, eating healthy, and saving money, humans tend to expect more from their future selves than their future selves are willing to do.

With systematic, predictable patterns like time inconsistency, IRL could make progress with AI systems. But often our biases aren’t so clear. According to Evans, deciphering which actions coincide with someone’s values and which actions spring from biases is difficult or even impossible in general.

“Suppose you promised to clean the house but you get a last minute offer to party with a friend and you can’t resist,” he suggests. “Is this a bias, or your value of living for the moment? This is a problem for using only inverse reinforcement learning to train an AI — how would it decide what are biases and values?”

 

Learning the Correct Values

Despite this conundrum, understanding human values and preferences is essential for AI systems, and developers have a very practical interest in training their machines to learn these preferences.

Already today, popular websites use AI to learn human preferences. With YouTube and Amazon, for instance, machine-learning algorithms observe your behavior and predict what you will want next. But while these recommendations are often useful, they have unintended consequences.

Consider the case of Zeynep Tufekci, an associate professor at the School of Information and Library Science at the University of North Carolina. After watching videos of Trump rallies to learn more about his voter appeal, Tufekci began seeing white nationalist propaganda and Holocaust denial videos on her “autoplay” queue. She soon realized that YouTube’s algorithm, optimized to keep users engaged, predictably suggests more extreme content as users watch more videos. This led her to call the website “The Great Radicalizer.”

This value misalignment in YouTube algorithms foreshadows the dangers of interactive learning with more advanced AI systems. Instead of optimizing advanced AI systems to appeal to our short-term desires and our attraction to extremes, designers must be able to optimize them to understand our deeper values and enhance our lives.

Evans suggests that we will want AI systems that can reason through our decisions better than humans can, understand when we are making biased decisions, and “help us better pursue our long-term preferences.” However, this will entail that AIs suggest things that seem bad to humans on first blush.

One can imagine an AI system suggesting a brilliant, counterintuitive modification to a business plan, and the human just finds it ridiculous. Or maybe an AI recommends a slightly longer, stress-free driving route to a first date, but the anxious driver takes the faster route anyway, unconvinced.

To help humans understand AIs in these scenarios, Evans and Stuhlmüller have researched how AI systems could reason in ways that are comprehensible to humans and can ultimately improve upon human reasoning.

One method (invented by Paul Christiano) is called “amplification,” where humans use AIs to help them think more deeply about decisions. Evans explains: “You want a system that does exactly the same kind of thinking that we would, but it’s able to do it faster, more efficiently, maybe more reliably. But it should be a kind of thinking that if you broke it down into small steps, humans could understand and follow.”

This second concept is called “factored cognition” – the idea of breaking sophisticated tasks into small, understandable steps. According to Evans, it’s not clear how generally factored cognition can succeed. Sometimes humans can break down their reasoning into small steps, but often we rely on intuition, which is much more difficult to break down.

 

Specifying the Problem

Evans and Stuhlmüller have started a research project on amplification and factored cognition, but they haven’t solved the problem of human biases in interactive learning – rather, they’ve set out to precisely lay out these complex issues for other researchers.

“It’s more about showing this problem in a more precise way than people had done previously,” says Evans. “We ended up getting interesting results, but one of our results in a sense is realizing that this is very difficult, and understanding why it’s difficult.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Making AI Safe in an Unpredictable World: An Interview with Thomas G. Dietterich

Our AI systems work remarkably well in closed worlds. That’s because these environments contain a set number of variables, making the worlds perfectly known and perfectly predictable. In these micro environments, machines only encounter objects that are familiar to them. As a result, they always know how they should act and respond. Unfortunately, these same systems quickly become confused when they are deployed in the real world, as many objects aren’t familiar to them. This is a bit of a problem because, when an AI system becomes confused, the results can be deadly.

Consider, for example, a self-driving car that encounters a novel object. Should it speed up, or should it slow down? Or consider an autonomous weapon system that sees an anomaly. Should it attack, or should it power down? Each of these examples involve life-and-death decisions, and they reveal why, if we are to deploy advanced AI systems in real world environments, we must be confident that they will behave correctly when they encounter unfamiliar objects.

Thomas G. Dietterich, Emeritus Professor of Computer Science at Oregon State University, explains that solving this identification problem begins with ensuring that our AI systems aren’t too confident — that they recognize when they encounter a foreign object and don’t misidentify it as something that they are acquainted with. To achieve this, Dietterich asserts that we must move away from (or, at least, greatly modify) the discriminative training methods that currently dominate AI research.

However, to do that, we must first address the “open category problem.”

 

Understanding the Open Category Problem

When driving down the road, we can encounter a near infinite number of anomalies. Perhaps a violent storm will arise, and hail will start to fall. Perhaps our vision will become impeded by smoke or excessive fog. Although these encounters may be unexpected, the human brain is able to easily analyze new information and decide on the appropriate course of action — we will recognize a newspaper drifting across the road and, instead of abruptly slamming on the breaks, continue on our way.

Because of the way that they are programmed, our computer systems aren’t able to do the same.

“The way we use machine learning to create AI systems and software these days generally uses something called ‘discriminative training,’” Dietterich explains, “which implicitly assumes that the world consists of only, say, a thousand different kinds of objects.” This means that, if a machine encounters a novel object, it will assume that it must be one of the thousand things that it was trained on. As a result, such systems misclassify all foreign objects.

This is the “open category problem” that Dietterich and his team are attempting to solve. Specifically, they are trying to ensure that our machines don’t assume that they have encountered every possible object, but are, instead, able to reliably detect — and ultimately respond to — new categories of alien objects.

Dietterich notes that, from a practical standpoint, this means creating an anomaly detection algorithm that assigns an anomaly score to each object detected by the AI system. That score must be compared against a set threshold and, if the anomaly score exceeds the threshold, the system will need to raise an alarm. Dietterich states that, in response to this alarm, the AI system should take a pre-determined safety action. For example, a self-driving car that detects an anomaly might slow down and pull off to the side of the road.

 

Creating a Theoretical Guarantee of Safety

There are two challenges to making this method work. First, Dietterich asserts that we need good anomaly detection algorithms. Previously, in order to determine what algorithms work well, the team compared the performance of eight state-of-the-art anomaly detection algorithms on a large collection of benchmark problems.

The second challenge is to set the alarm threshold so that the AI system is guaranteed to detect a desired fraction of the alien objects, such as 99%. Dietterich says that formulating a reliable setting for this threshold is one of the most challenging research problems because there are, potentially, infinite kinds of alien objects. “The problem is that we can’t have labeled training data for all of the aliens. If we had such data, we would simply train the discriminative classifier on that labeled data,” Dietterich says.

To circumvent this labeling issue, the team assumes that the discriminative classifier has access to a representative sample of “query objects” that reflect the larger statistical population. Such a sample could, for example, be obtained by collecting data from cars driving on highways around the world. This sample will include some fraction of unknown objects, and the remaining objects belong to known object categories.

Notably, the data in the sample is not labeled. Instead, the AI system is given an estimate of the fraction of aliens in the sample. And by combining the information in the sample with the labeled training data that was employed to train the discriminative classifier, the team’s new algorithm can choose a good alarm threshold. If the estimated fraction of aliens is known to be an over-estimate of the true fraction, then the chosen threshold is guaranteed to detect the target percentage of aliens (i.e. 99%).

Ultimately, the above is the first method that can give a theoretical guarantee of safety for detecting alien objects, and a paper reporting the results was presented at ICML 2018. “We are able to guarantee, with high probability, that we can find 99% all of these new objects,” Dietterich says.

In the next stage of their research, Dietterich and his team plan to begin testing their algorithm in a more complex setting. Thus far, they’ve been looking primarily at classification, where the system looks at an image and classifies it. Next, they plan to move to controlling an agent, like a robot of self-driving car. “At each point in time, in order to decide what action to choose, our system will do a ‘look ahead search’ based on a learned model of the behavior of the agent and its environment. If the look ahead arrives at a state that is rated as ‘alien’ by our method, then this indicates that the agent is about to enter a part of the state space where it is not competent to choose correct actions,” Dietterich says. In response, as previously mentioned, the agent should execute a series of safety actions and request human assistance.

But what does this safety action actually consist of?

 

Responding to Aliens

Dietterich notes that, once something is identified as an anomaly and the alarm is sounded, the nature of this fall back system will depend on the machine in question, like whether the AI system is in a self-driving car or autonomous weapon.

To explain how these secondary systems operate, Dietterich turns to self-driving cars. “In the Google car, if the computers lose power, then there’s a backup system that automatically slows the car down and pulls it over to the side of the road.” However, Dietterich clarifies that stopping isn’t always the best course of action. One may assume that a car should come to a halt if an unidentified object crosses its path; however, if the unidentified object happens to be a blanket of snow on a particularly icy day, hitting the breaks gets more complicated. The system would need to factor in the icy roads, any cars that may be driving behind, and whether these cars can break in time to avoid a rear end collision.

But if we can’t predict every eventuality, how can we expect to program an AI system so that it behaves correctly and in a way that is safe?

Unfortunately, there’s no easy answer; however, Dietterich clarifies that there are some general best practices; “There’s no universal solution to the safety problem, but obviously there are some actions that are safer than others. Generally speaking, removing energy from the system is a good idea,” he says. Ultimately, Dietterich asserts that all the work related to programming safe AI really boils down to determining how we want our machines to behave under specific scenarios, and he argues that we need to rearticulate how we characterize this problem, and focus on accounting for all the factors, if we are to develop a sound approach.

Dietterich notes that “when we look at these problems, they tend to get lumped under a classification of ‘ethical decision making,’ but what they really are is problems that are incredibly complex. They depend tremendously on the context in which they are operating, the human beings, the other innovations, the other automated systems, and so on. The challenge is correctly describing how we want the system to behave and then ensuring that our implementations actually comply with those requirements.” And he concludes, “the big risk in the future of AI is the same as the big risk in any software system, which is that we build the wrong system, and so it does the wrong thing. Arthur C Clark in 2001: A Space Odyssey had it exactly right. The Hal 9000 didn’t ‘go rogue;’ it was just doing what it had been programmed to do.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Governing AI: An Inside Look at the Quest to Ensure AI Benefits Humanity

Click here to see this page in other languages:  Russian 

Finance, education, medicine, programming, the arts — artificial intelligence is set to disrupt nearly every sector of our society. Governments and policy experts have started to realize that, in order to prepare for this future, in order to minimize the risks and ensure that AI benefits humanity, we need to start planning for the arrival of advanced AI systems today.

Although we are still in the early moments of this movement, the landscape looks promising. Several nations and independent firms have already started to strategize and develop polices for the governance of AI. Last year, the UAE appointed the world’s first Minister of Artificial Intelligence, and Germany took smaller, but similar, steps in 2017, when the Ethics Commission at the German Ministry of Transport and Digital Infrastructure developed the world’s first set of regulatory guidelines for automated and connected driving.

This work is notable; however, these efforts have yet to coalesce into a larger governance framework that extends beyond national boundaries. Nick Bostrom’s Strategic Artificial Intelligence Research Center seeks to assist in resolving this issue by understanding, and ultimately shaping, the strategic landscape of long-term AI development on a global scale.

 

Developing a Global Strategy: Where We Are Today

The Strategic Artificial Intelligence Research Center was founded in 2015 with the knowledge that, to truly circumvent the threats posed by AI, the world needs a concerted effort focused on tackling unsolved problems related to AI policy and development. The Governance of AI Program (GovAI), co-directed by Bostrom and Allan Dafoe, is the primary research program that has evolved from this center. Its central mission, as articulated by the directors, is to “examine the political, economic, military, governance, and ethical dimensions of how humanity can best navigate the transition to such advanced AI systems.” In this respect, the program is focused on strategy — on shaping the social, political, and governmental systems that influence AI research and development — as opposed to focusing on the technical hurdles that must be overcome in order to create and program safe AI.

To develop a sound AI strategy, the program works with social scientists, politicians, corporate leaders, and artificial intelligence/machine learning engineers to address questions of how we should approach the challenge of governing artificial intelligence. In a recent 80,0000 Hours podcast with Rob Wiblin, Dafoe outlined how the team’s research shapes up from a practical standpoint, asserting that the work focuses on answering questions that fall under three primary categories:

  • The Technical Landscape: This category seeks to answer all the questions that are related to research trends in the field of AI with the aim of understanding what future technological trajectories are plausible and how these trajectories affect the challenges of governing advanced AI systems.
  • AI Politics: This category focuses on questions that are related to the dynamics of different groups, corporations, and governments pursuing their own interests in relation to AI, and it seeks to understand what risks might arise as a result and how we may be able to mitigate these risks.
  • AI Governance: This category examines positive visions of a future in which humanity coordinates to govern advanced AI in a safe and robust manner. This raises questions such as how this framework should operate and what values we would want to encode in a governance regime.

The above categories provide a clearer way of understanding the various objectives of those invested in researching AI governance and strategy; however, these categories are fairly large in scope. To help elucidate the work they are performing, Jade Leung, a researcher with GovAI and a DPhil candidate in International Relations at the University of Oxford, outlined some of the specific workstreams that the team is currently pursuing.

One of the most intriguing areas of research is the Chinese AI Strategy workstream. This line of research examines things like China’s AI capabilities vis-à-vis other countries, official documentation regarding China’s AI policy, and the various power dynamics at play in the nation with an aim of understanding, as Leung summarizes, “China’s ambition to become an AI superpower and the state of Chinese thinking on safety, cooperation, and AGI.” Ultimately, GovAI seeks to outline the key features of China’s AI strategy in order to understand one of the most important actors in AI governance. The program published Deciphering China’s AI Dream in March of 2018a report that analyzes new features of China’s national AI strategy, and has plans to build upon research in the near future.

Another workstream is Firm-Government Cooperation, which examines the role that private firms play in relation to the development of advanced AI and how these players are likely to interact with national governments. In a recent talk at EA Global San Francisco, Leung focused on how private industry is already playing a significant role in AI development and why, when considering how to govern AI, private players must be included in strategy considerations as a vital part of the equation. The description of the talk succinctly summarizes the key focal areas, noting that “private firms are the only prominent actors that have expressed ambitions to develop AGI, and lead at the cutting edge of advanced AI research. It is therefore critical to consider how these private firms should be involved in the future of AI governance.”

Other work that Leung highlighted includes modeling technology race dynamics and analyzing the distribution of AI talent and hardware globally.

 

The Road Ahead

When asked how much confidence she has that AI researchers will ultimately coalesce and be successful in their attempts to shape the landscape of long-term AI development internationally, Leung was cautious with her response, noting that far more hands are needed. “There is certainly a greater need for more researchers to be tackling these questions. As a research area as well as an area of policy action, long-term safe and robust AI governance remains a neglected mission,” she said.

Additionally, Leung noted that, at this juncture, although some concrete research is already underway, a lot of the work is focused on framing issues related to AI governance and, in so doing, revealing the various avenues in need of research. As a result, the team doesn’t yet have concrete recommendations for specific actions governing bodies should commit to, as further foundational analysis is needed. “We don’t have sufficiently robust and concrete policy recommendations for the near term as it stands, given the degrees of uncertainty around this problem,” she said.

However, both Leung and Defoe are optimistic and assert that this information gap will likely change — and rapidly. Researchers across disciplines are increasingly becoming aware of the significance of this topic, and as more individuals begin researching and participating in this community, the various avenues of research will become more focused. “In two years, we’ll probably have a much more substantial research community. But today, we’re just figuring out what are the most important and tractable problems and how we can best recruit to work on those problems,” Dafoe told Wiblin.

The assurances that a more robust community will likely form soon are encouraging; however, questions remain regarding whether this community will come together with enough time to develop a solid governance framework. As Dafoe notes, we have never witnessed an intelligence explosion before, so we have no examples to look to for guidance when attempting to develop projections and timelines regarding when we will have advanced AI systems.

Ultimately, the lack of projections is precisely why we must significantly invest in AI strategy research in the immediate future. As Bostrom notes in Superintelligence: Paths, Dangers, and Strategies, AI is not simply a disruptive technology, it is likely the most disruptive technology humanity will ever encounter: “[Superintelligence] is quite possibly the most important and most daunting challenge humanity has ever faced. And — whether we succeed or fail — it is probably the last challenge we will ever face.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Edit: The title of the article has been changed to reflect the fact that this is not about regulating AI.

Machine Reasoning and the Rise of Artificial General Intelligences: An Interview With Bart Selman

From Uber’s advanced computer vision system to Netflix’s innovative recommendation algorithm, machine learning technologies are nearly omnipresent in our society. They filter our emails, personalize our newsfeeds, update our GPS systems, and drive our personal assistants. However, despite the fact that such technologies are leading a revolution in artificial intelligence, some would contend that these machine learning systems aren’t truly intelligent.

The argument, in its most basic sense, centers on the fact that machine learning evolved from theories of pattern recognition and, as such, the capabilities of such systems generally extend to just one task and are centered on making predictions from existing data sets. AI researchers like Rodney Brooks, a former professor of Robotics at MIT, argue that true reasoning, and true intelligence, is several steps beyond these kinds of learning systems.

But if we already have machines that are proficient at learning through pattern recognition, how long will it be until we have machines that are capable of true reasoning, and how will AI evolve once it reaches this point?

Understanding the pace and path that artificial reasoning will follow over the coming decades is an important part of ensuring that AI is safe, and that it does not pose a threat to humanity; however, before it is possible to understand the feasibility of machine reasoning across different categories of cognition, and the path that artificial intelligences will likely follow as they continue their evolution, it is necessary to first define exactly what is meant by the term “reasoning.”

 

Understanding Intellect

Bart Selman is a professor of Computer Science at Cornell University. His research is dedicated to understanding the evolution of machine reasoning. According to his methodology, reasoning is described as taking pieces of information, combining them together, and using the fragments to draw logical conclusions or devise new information.

Sports provide a ready example of expounding what machine reasoning is really all about. When humans see soccer players on a field kicking a ball about, they can, with very little difficulty, ascertain that these individuals are soccer players. Today’s AI can also make this determination. However, humans can also see a person in a soccer outfit riding a bike down a city street, and they would still be able to infer that the person is a soccer player. Today’s AIs probably wouldn’t be able to make this connection.

This process— of taking information that is known, uniting it with background knowledge, and making inferences regarding information that is unknown or uncertain — is a reasoning process. To this end, Selman notes that machine reasoning is not about making predictions, it’s about using logical techniques (like the abductive process mentioned above) to answer a question or form an inference.

Since humans do not typically reason through pattern recognition and synthesis, but by using logical processes like induction, deduction, and abduction, Selman asserts that machine reasoning is a form of intelligence that is more like human intelligence. He continues by noting that the creation of machines that are endowed with more human-like reasoning processes, and breaking away from traditional pattern recognition approaches, is the key to making systems that not only predict outcomes but also understand and explain their solutions. However, Selman notes that making human-level AI is also the first step to attaining super-human levels of cognition.

And due to the existential threat this could pose to humanity, it is necessary to understand exactly how this evolution will unfold.

 

The Making of a (super)Mind

It may seem like truly intelligent AI are a problem for future generations. Yet, when it comes to machines, the consensus among AI experts is that rapid progress is already being made in machine reasoning. In fact, many researchers assert that human-level cognition will be achieved across a number of metrics in the next few decades. Yet, questions remain regarding how AI systems will advance once artificial general intelligence is realized. A key question is whether these advances can accelerate farther and scale-up to super-human intelligence.

This process is something that Selman has devoted his life to studying. Specifically, he researches the pace of AI scalability across different categories of cognition and the feasibility of super-human levels of cognition in machines.

Selman states that attempting to make blanket statements about when and how machines will surpass humans is a difficult task, as machine cognition is disjointed and does not draw a perfect parallel with human cognition. “In some ways, machines are far beyond what humans can do,” Selman explains, “for example, when it comes to certain areas in mathematics, machines can take billions of reasoning steps and see the truth of a statement in a fraction of a second. The human has no ability to do that kind of reasoning.”

However, when it comes to the kind of reasoning mentioned above, where meaning is derived from deductive or inductive processes that are based on the integration of new data, Selman says that computers are somewhat lacking. “In terms of the standard reasoning that humans are good at, they are not there yet,” he explains. Today’s systems are very good at some tasks, sometimes far better than humans, but only in a very narrow range of applications.

Given these variances, how can we determine how AI will evolve in various areas and understand how they will accelerate after general human level AI is achieved?

For his work, Selman relies on computational complexity theory, which has two primary functions. First, it can be used to characterize the efficiency of an algorithm used for solving instances of a problem. As Johns Hopkins’ Leslie Hall notes, “broadly stated, the computational complexity of an algorithm is a measure of how many steps the algorithm will require in the worst case for an instance [of a problem] of a given size.” Second, it is a method of classifying tasks (computational problems) according to their inherent difficulty. These two features provide us with a way of determining how artificial intelligences will likely evolve by offering a formal method of determining the easiest, and therefore most probable, areas of advancement. It also provides key insights into the speed of this scalability.

Ultimately, this work is important, as the abilities of our machines are fast-changing. As Selman notes, “The way that we measure the capabilities of programs that do reasoning is by looking at the number of facts that they can combine quickly. About 25 years ago, the best reasoning engines could combine approximately 200 or 300 facts and deduce new information from that. The current reasoning engines can combine millions of facts.” This exponential growth has great significance when it comes to the scale-up to human levels of machine reasoning.

As Selman explains, given the present abilities of our AI systems, it may seem like machines with true reasoning capabilities are still some ways off; however, thanks to the excessive rate of technological progress, we will likely start to see machines that have intellectual abilities that vastly outpace our own in rather short order. “Ten years from now, we’ll still find them [artificially intelligent machines] very much lacking in understanding, but twenty or thirty years from now, machines will have likely built up the same knowledge that a young adult has,” Selman notes. Anticipating exactly when this transition will occur will help us better understand the actions that we should take, and the research that the current generation must invest in, in order to be prepared for this advancement.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Will the Rise of Artificial Superintelligences Impact Humanity?

Cars drive themselves down our streets. Planes fly themselves through our skies. Medical technologies diagnose illnesses, recommend treatment plans, and save lives.

Artificially intelligent systems are already among us, and they have been for some time now. However, the world has yet to see an artificial superintelligence (ASI) — a synthetic system that has cognitive abilities which surpass our own across every relevant metric. But technology is progressing rapidly, and many AI researchers believe the era of the artificial superintelligence may be fast approaching. Once it arrives, researchers and politicians alike have no way of predicting what will happen.

Fortunately, a number of individuals are already working to ensure that the rise of this artificial superintelligence doesn’t precipitate the fall of humanity.

Risky Business

Seth Baum is the Executive Director of the Global Catastrophic Risk Institute, a thinktank that’s focused on preventing the destruction of global civilization.

When Baum discusses his work, he outlines GCRI’s mission with a matter-of-fact tone that, considering the monumental nature of the project, is more than a little jarring. “All of our work is about keeping the world safe,” Baum notes, and he continues by explaining that GCRI focuses on a host of threats that put the survival of our species in peril. From climate change to nuclear war, from extraterrestrial intelligence to artificial intelligence — GCRI covers it all.

When it comes to artificial intelligence, GCRI has several initiatives. However, their main AI project, which received funding from the Future of Life Institute, centers on the risks associated with artificial superintelligences. Or, as Baum puts it, they do “risk analysis for computers taking over the world and killing everyone.” Specifically, Baum stated that GCRI is working on “developing structured risk models to help people understand what the risks might be and, also, where some of the best opportunities to reduce this risk are located.”

Unsurprisingly, the task is not an easy one.

The fundamental problem stems from the fact that, unlike more common threats, such as the risk of dying in a car accident or the risk of getting cancer, researchers working on ASI risk analysis don’t have solid case studies to use when making their models and predictions. As Baum states, “Computers have never taken over the world and killed everyone before. That means we can’t just look at the data, which is what we do for a lot of other risks. And not only has this never happened before, the technology doesn’t even exist yet. And if it is built, we’re not sure how it would be built.”

So, how can researchers determine the risks posed by an artificial superintelligence if they don’t know exactly what that intelligence will look like and they have no real data to work with?

Luckily, when it comes to artificial superintelligences, AI experts aren’t totally in the realm of the unknown. Baum asserts that there are some ideas and a bit of relevant evidence, but these things are scattered. To address this issue, Baum and his team create models. They take what information is available, structure it, and then distribute the result in an organized fashion so that researchers can better understand the topic, the various factors that may influence the outcome of the issue at hand, and ultimately have a better understanding of the various risks associated with ASI.

For example, when attempting to figure how easy is it to design an AI so that it acts safely, one of the subdetails that needs to be modeled is whether or not humans will be able to observe the AI and test it before it gets out of control. In other words, whether AI researchers can recognize that an AI has a dangerous design and shut it down. To model this scenario and determine what the risks and most likely scenarios are, Baum and his team take the available information — the perspectives and opinions of AI researchers, what is already known about AI technology and how it functions, etc. — and they model the topic by structuring the aforementioned information along with any uncertainty in the arguments or data sets.

This kind of modeling and risk analysis ultimately allows the team to better understand the scope of the issue and, by structuring the information in a clear way, advance an ongoing conversation in the superintelligence research community. The modeling doesn’t give us a complete picture of what will happen, but it does allow us to better understand the risks that we’re facing when it comes to the rise of ASI, what events and outcomes are likely, as well as the specific steps that policy makers and AI researchers should take to ensure that ASI benefits humanity.

Of course, when it comes to the risks of artificial superintelligences, whether or not we will be able to observe and test our AI is just one small part of a much larger model.

Modeling a Catastrophe

In order to understand what it would take to bring about the ASI apocalypse, and how we could possibly prevent it, Baum and his team have created a model that investigates the following questions from a number of vantage points:

  • Step 1: Is it possible to build an artificial superintelligence?
  • Step 2: Will humans build the superintelligence?
  • Step 3: Will humans lose control of the superintelligence?

This first half of the model is centered on the nuts and bolts of how to build an ASI. The second half of the model dives into risk analysis related to the creation of an ASI that is harmful and looks at the following:

  • Step 1: Will humans design an artificial superintelligence that is harmful?
  • Step 2: Will the superintelligence develop harmful behavior on its own?
  • Step 3: Is there something deterring the superintelligence from acting in a way that is harmful (such as another AI or some human action)?

Each step in this series models a number of different possibilities to reveal the various risks that we face and how significant, and probable, these threats are. Although the model is still being refined, Baum says that substantial progress has already been made. “The risk is starting to make sense. I’m starting to see exactly what it would take to see this type of catastrophe,” Baum said. Yet, he is quick to clarify that the research is still a bit too young to say much definitively, “Those of us who study superintelligence and all the risks and policy aspects of it, we’re not exactly sure what policy we would want right now. What’s happening right now is more of a general-purpose conversation on AI. It’s one that recognizes the fact that AI is more than just a technological and economic opportunity and that there are risks involved and difficult ethical issues.”

Ultimately, Baum hopes that these conversations, when coupled with the understanding that comes from the models that he is currently developing alongside his team, will allow GCRI to better prepare policy makers and scientists alike for the rise of a new kind of (super)intelligence.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Teaching Today’s AI Students To Be Tomorrow’s Ethical Leaders: An Interview With Yan Zhang

Some of the greatest scientists and inventors of the future are sitting in high school classrooms right now, breezing through calculus and eagerly awaiting freshman year at the world’s top universities. They may have already won Math Olympiads or invented clever, new internet applications. We know these students are smart, but are they prepared to responsibly guide the future of technology?

Developing safe and beneficial technology requires more than technical expertise — it requires a well-rounded education and the ability to understand other perspectives. But since math and science students must spend so much time doing technical work, they often lack the skills and experience necessary to understand how their inventions will impact society.

These educational gaps could prove problematic as artificial intelligence assumes a greater role in our lives. AI research is booming among young computer scientists, and these students need to understand the complex ethical, governance, and safety challenges posed by their innovations.

 

SPARC

In 2012, a group of AI researchers and safety advocates – Paul Christiano, Jacob Steinhardt, Andrew Critch, Anna Salamon, and Yan Zhang – created the Summer Program in Applied Rationality and Cognition (SPARC) to address the many issues that face quantitatively strong teenagers, including the issue of educational gaps in AI. As with all technologies, they explain, the more the AI community consists of thoughtful, intelligent, broad-minded reasoners, the more likely AI is to be developed in a safe and beneficial manner.

Each summer, the SPARC founders invite 30-35 mathematically gifted high school students to participate in their two-week program. Zhang, SPARC’s director, explains: “Our goals are to generate a strong community, expose these students to ideas that they’re not going to get in class – blind spots of being a quantitatively strong teenager in today’s world, like empathy and social dynamics. Overall we want to make them more powerful individuals who can bring positive change to the world.”

To help students make a positive impact, SPARC instructors teach core ideas in effective altruism (EA). “We have a lot of conversations about EA, but we don’t push the students to become EA,” Zhang says. “We expose them to good ideas, and I think that’s a healthier way to do mentorship.”

SPARC also exposes students to machine learning, AI safety, and existential risks. In 2016 and 2017, they held over 10 classes on these topics, including: “Machine Learning” and “Tensorflow” taught by Jacob Steinhardt, “Irresponsible Futurism” and “Effective Do-Gooding” taught by Paul Christiano, “Optimization” taught by John Schulman, and “Long-Term Thinking on AI and Automization” taught by Michael Webb.

But SPARC instructors don’t push students down the AI path either. Instead, they encourage students to apply SPARC’s holistic training to make a more positive impact in any field.

 

Thinking on the Margin: The Role of Social Skills

Making the most positive impact requires thinking on the margin, and asking: What one additional unit of knowledge will be most helpful for creating positive impact? For these students, most of whom have won Math and Computing Olympiads, it’s usually not more math.

“A weakness of a lot of mathematically-minded students are things like social skills or having productive arguments with people,” Zhang says. “Because to be impactful you need your quantitative skills, but you need to also be able to relate with people.”

To counter this weakness, he teaches classes on social skills and signaling, and occasionally leads improvisational games. SPARC still teaches a lot of math, but Zhang is more interested in addressing these students’ educational blind spots – the same blind spots that the instructors themselves had as students. “What would have made us more impactful individuals, and also more complete and more human in many ways?” he asks.

Working with non-math students can help, so Zhang and his colleagues have experimented with bringing excellent writers and original thinkers into the program. “We’ve consistently had really good successes with those students, because they bring something that the Math Olympiad kids don’t have,” Zhang says.

SPARC also broadens students’ horizons with guest speakers from academia and organizations such as the Open Philanthropy Project, OpenAI, Dropbox and Quora. In one talk, Dropbox engineer Albert Ni spoke to SPARC students about “common mistakes that math people make when they try to do things later in life.”

In another successful experiment suggested by Ofer Grossman, a SPARC alum who is now a staff member, SPARC made half of all classes optional in 2017. The classes were still packed because students appreciated the culture. The founders also agreed that conversations after class are often more impactful than classes, and therefore engineered one-on-one time and group discussions into the curriculum. Thinking on the margin, they ask: “What are the things that were memorable about school? What are the good parts? Can we do more of those and less of the others?”

Above all, SPARC fosters a culture of openness, curiosity and accountability. Inherent in this project is “cognitive debiasing” – learning about common biases like selection bias and confirmation bias, and correcting for them. “We do a lot of de-biasing in our interactions with each other, very explicitly,” Zhang says. “We also have classes on cognitive biases, but the culture is the more important part.”

 

AI Research and Future Leaders

Designing safe and beneficial technology requires technical expertise, but in SPARC’s view, cultivating a holistic research culture is equally important. Today’s top students may make some of the most consequential AI breakthroughs in the future, and their values, education and temperament will play a critical role in ensuring that advanced AI is deployed safely and for the common good.

“This is also important outside of AI,” Zhang explains. “The official SPARC stance is to make these students future leaders in their communities, whether it’s AI, academia, medicine, or law. These leaders could then talk to each other and become allies instead of having a bunch of splintered, narrow disciplines.”

As SPARC approaches its 7th year, some alumni have already begun to make an impact. A few AI-oriented alumni recently founded AlphaSheets – a collaborative, programmable spreadsheet for finance that is less prone to error – while other students are leading a “hacker house” with people in Silicon Valley. Additionally, SPARC inspired the creation of ESPR, a similar European program explicitly focused on AI risk.

But most impacts will be less tangible. “Different pockets of people interested in different things have been working with SPARC’s resources, and they’re forming a lot of social groups,” Zhang explains. “It’s like a bunch of little sparks and we don’t quite know what they’ll become, but I’m pretty excited about next five years.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How AI Handles Uncertainty: An Interview With Brian Ziebart

Click here to see this page in other languages:  Russian

When training image detectors, AI researchers can’t replicate the real world. They teach systems what to expect by feeding them training data, such as photographs, computer-generated images, real video and simulated video, but these practice environments can never capture the messiness of the physical world.

In machine learning (ML), image detectors learn to spot objects by drawing bounding boxes around them and giving them labels. And while this training process succeeds in simple environments, it gets complicated quickly.

 

 

 

 

 

 

 

It’s easy to define the person on the left, but how would you draw a bounding box around the person on the right? Would you only include the visible parts of his body, or also his hidden torso and legs? These differences may seem trivial, but they point to a fundamental problem in object recognition: there rarely is a single best way to define an object.

As this second image demonstrates, the real world is rarely clear-cut, and the “right” answer is usually ambiguous. Yet when ML systems use training data to develop their understanding of the world, they often fail to reflect this. Rather than recognizing uncertainty and ambiguity, these systems often confidently approach new situations no differently than their training data, which can put the systems and humans at risk.

Brian Ziebart, a Professor of Computer Science at the University of Illinois at Chicago, is conducting research to improve AI systems’ ability to operate amidst the inherent uncertainty around them. The physical world is messy and unpredictable, and if we are to trust our AI systems, they must be able to safely handle it.

 

Overconfidence in ML Systems

ML systems will inevitably confront real-world scenarios that their training data never prepared them for. But, as Ziebart explains, current statistical models “tend to assume that the data that they’ll see in the future will look a lot like the data they’ve seen in the past.”

As a result, these systems are overly confident that they know what to do when they encounter new data points, even when those data points look nothing like what they’ve seen. ML systems falsely assume that their training prepared them for everything, and the resulting overconfidence can lead to dangerous consequences.

Consider image detection for a self-driving car. A car might train its image detection on data from the dashboard of another car, tracking the visual field and drawing bounding boxes around certain objects, as in the image below:

Bounding boxes on a highway – CloudFactory Blog

 

 

 

 

 

 

 

 

 

 

 

 

For clear views like this, image detectors excel. But the real world isn’t always this simple. If researchers train an image detector on clean, well-lit images in the lab, it might accurately recognize objects 80% of the time during the day. But when forced to navigate roads on a rainy night, it might drop to 40%.

“If you collect all of your data during the day and then try to deploy the system at night, then however it was trained to do image detection during the day just isn’t going to work well when you generalize into those new settings,” Ziebart explains.

Moreover, the ML system might not recognize the problem: since the system assumes that its training covered everything, it will remain confident about its decisions and continue “to make strong predictions that are just inaccurate,” Ziebart adds.

In contrast, humans tend to recognize when previous experience doesn’t generalize into new settings. If a driver spots an unknown object ahead in the road, she wouldn’t just plow through the object. Instead, she might slow down, pay attention to how other cars respond to the object, and consider swerving if she can do so safely. When humans feel uncertain about our environment, we exercise caution to avoid making dangerous mistakes.

Ziebart would like AI systems to incorporate similar levels of caution in uncertain situations. Instead of confidently making mistakes, a system should recognize its uncertainty and ask questions to glean more information, much like an uncertain human would.

 

An Adversarial Approach

Training and practice may never prepare AI systems for every possible situation, but researchers can make their training methods more foolproof. Ziebart posits that feeding systems messier data in the lab can train them to better recognize and address uncertainty.

Conveniently, humans can provide this messy, real-world data. By hiring a group of human annotators to look at images and draw bounding boxes around certain objects – cars, people, dogs, trees, etc. – researchers can “build into the classifier some idea of what ‘normal’ data looks like,” Ziebart explains.

“If you ask ten different people to provide these bounding boxes, you’re likely to get back ten different bounding boxes,” he says. “There’s just a lot of inherent ambiguity in how people think about the ground truth for these things.”

Returning to the image above of the man in the car, human annotators might give ten different bounding boxes that capture different portions of the visible and hidden person. By feeding ML systems this confusing and contradictory data, Ziebart prepares them to expect ambiguity.

“We’re synthesizing more noise into the data set in our training procedure,” Ziebart explains. This noise reflects the messiness of the real world, and trains systems to be cautious when making predictions in new environments. Cautious and uncertain, AI systems will seek additional information and learn to navigate the confusing situations they encounter.

Of course, self-driving cars shouldn’t have to ask questions. If a car’s image detection spots a foreign object up ahead, for instance, it won’t have time to ask humans for help. But if it’s trained to recognize uncertainty and act cautiously, it might slow down, detect what other cars are doing, and safely navigate around the object.

 

Building Blocks for Future Machines

Ziebart’s research remains in training settings thus far. He feeds systems messy, varied data and trains them to provide bounding boxes that have at least 70% overlap with people’s bounding boxes. And his process has already produced impressive results. On an ImageNet object detection task investigated in collaboration with Sima Behpour (University of Illinois at Chicago) and Kris Kitani (Carnegie Mellon University), for example, Ziebart’s adversarial approach “improves performance by over 16% compared to the best performing data augmentation method.” Trained to operate amidst uncertain environments, these systems more effectively manage new data points that training didn’t explicitly prepare them for.

But while Ziebart trains relatively narrow AI systems, he believes that this research can scale up to more advanced systems like autonomous cars and public transit systems.

“I view this as kind of a fundamental issue in how we design these predictors,” he says. “We’ve been trying to construct better building blocks on which to make machine learning – better first principles for machine learning that’ll be more robust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Optimizing AI Safety Research: An Interview With Owen Cotton-Barratt

Artificial intelligence poses a myriad of risks to humanity. From privacy concerns, to algorithmic bias and “black box” decision making, to broader questions of value alignment, recursive self-improvement, and existential risk from superintelligence — there’s no shortage of AI safety issues.  

AI safety research aims to address all of these concerns. But with limited funding and too few researchers, trade-offs in research are inevitable. In order to ensure that the AI safety community tackles the most important questions, researchers must prioritize their causes.

Owen Cotton-Barratt, along with his colleagues at the Future of Humanity Institute (FHI) and the Centre for Effective Altruism (CEA), looks at this ‘cause prioritization’ for the AI safety community. They analyze which projects are more likely to help mitigate catastrophic or existential risks from highly-advanced AI systems, especially artificial general intelligence (AGI). By modeling trade-offs between different types of research, Cotton-Barratt hopes to guide scientists toward more effective AI safety research projects.

 

Technical and Strategic Work

The first step of cause prioritization is understanding the work already being done. Broadly speaking, AI safety research happens in two domains: technical work and strategic work.

AI’s technical safety challenge is to keep machines safe and secure as they become more capable and creative. By making AI systems more predictable, more transparent, and more robustly aligned with our goals and values, we can significantly reduce the risk of harm. Technical safety work includes Stuart Russell’s research on reinforcement learning and Dan Weld’s work on explainable machine learning, since they’re improving the actual programming in AI systems.

In addition, the Machine Intelligence Research Institute (MIRI) recently released a technical safety agenda aimed at aligning machine intelligence with human interests in the long term, while OpenAI, another non-profit AI research company, is investigating the “many research problems around ensuring that modern machine learning systems operate as intended,” following suggestions from the seminal paper Concrete Problems in AI Safety.

Strategic safety work is broader, and asks how society can best prepare for and mitigate the risks of powerful AI. This research includes analyzing the political environment surrounding AI development, facilitating open dialogue between research areas, disincentivizing arms races, and learning from game theory and neuroscience about probable outcomes for AI. Yale professor Allan Dafoe has recently focused on strategic work, researching the international politics of artificial intelligence and consulting for governments, AI labs and nonprofits about AI risks. And Yale bioethicist Wendell Wallach, apart from his work on “silo busting,” is researching forms of global governance for AI.

Cause prioritization is strategy work, as well. Cotton-Barratt explains, “Strategy work includes analyzing the safety landscape itself and considering what kind of work do we think we’re going to have lots of, what are we going to have less of, and therefore helping us steer resources and be more targeted in our work.”

 

 

 

 

 

 

 

 

 

 

 

Who Needs More Funding?

As the graph above illustrates, AI safety spending has grown significantly since 2015. And while more money doesn’t always translate into improved results, funding patterns are easy to assess and can say a lot about research priorities. Seb Farquhar, Cotton-Barratt’s colleague at CEA, wrote a post earlier this year analyzing AI safety funding and suggesting ways to better allocate future investments.

To start, he suggests that the technical research community acquire more personal investigators to take the research agenda, detailed in Concrete Problems in AI Safety, forward. OpenAI is already taking a lead on this. Additionally, the community should go out of its way to ensure that emerging AI safety centers hire the best candidates, since these researchers will shape each center’s success for years to come.

In general, Farquhar notes that strategy, outreach and policy work haven’t kept up with the overall growth of AI safety research. He suggests that more people focus on improving communication about long-run strategies between AI safety research teams, between the AI safety community and the broader AI community, and between policymakers and researchers. Building more PhD and Masters courses on AI strategy and policy could establish a pipeline to fill this void, he adds.

To complement Farquhar’s data, Cotton-Barratt’s colleague Max Dalton created a mathematical model to track how more funding and more people working on a safety problem translate into useful progress or solutions. The model tries to answer such questions as: if we want to reduce AI’s existential risks, how much of an effect do we get by investing money in strategy research versus technical research?

In general, technical research is easier to track than strategic work in mathematical models. For example, spending more on strategic ethics research may be vital for AI safety, but it’s difficult to quantify that impact. Improving models of reinforcement learning, however, can produce safer and more robustly-aligned machines. With clearer feedback loops, these technical projects fit best with Dalton’s models.

 

Near-sightedness and AGI

But these models also confront major uncertainty. No one really knows when AGI will be developed, and this makes it difficult to determine the most important research. If AGI will be developed in five years, perhaps researchers should focus only on the most essential safety work, such as improving transparency in AI systems. But if we have thirty years, researchers can probably afford to dive into more theoretical work.

Moreover, no one really knows how AGI will function. Machine learning and deep neural networks have ushered in a new AI revolution, but AGI will likely be developed on architectures far different from AlphaGo and Watson.

This makes some long-term safety research a risky investment, even if, as many argue, it is the most important research we can do. For example, researchers could spend years making deep neural nets safe and transparent, only to find their work wasted when AGI develops on an entirely different programming architecture.

Cotton-Barratt attributes this issue to ‘nearsightedness,’ and discussed it in a recent talk at Effective Altruism Global this summer. Humans often can’t anticipate disruptive change, and AI researchers are no exception.

“Work that we might do for long-term scenarios might turn out to be completely confused because we weren’t thinking of the right type of things,” he explains. “We have more leverage over the near-term scenarios because we’re more able to assess what they’re going to look like.”

Any additional AI safety research is better than none, but given the unknown timelines and the potential gravity of AI’s threats to humanity, we’re better off pursuing — to the extent possible — the most effective AI safety research.

By helping the AI research portfolio advance in a more efficient and comprehensive direction, Cotton-Barratt and his colleagues hope to ensure that when machines eventually outsmart us, we will have asked — and hopefully answered — the right questions.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

Transparent and Interpretable AI: an interview with Percy Liang

At the end of 2017, the United States House of Representatives passed a bill called the SELF DRIVE Act, laying out an initial federal framework for autonomous vehicle regulation. Autonomous cars have been undergoing testing on public roads for almost two decades. With the passing of this bill, along with the increasing safety benefits of autonomous vehicles, it is likely that they will become even more prevalent in our daily lives. This is true for numerous autonomous technologies including those in the medical, legal, and safety fields – just to name a few.

To that end, researchers, developers, and users alike must be able to have confidence in these types of technologies that rely heavily on artificial intelligence (AI). This extends beyond autonomous vehicles, applying to everything from security devices in your smart home to the personal assistant in your phone.

 

Predictability in Machine Learning

Percy Liang, Assistant Professor of Computer Science at Stanford University, explains that humans rely on some degree of predictability in their day-to-day interactions — both with other humans and automated systems (including, but not limited to, their cars). One way to create this predictability is by taking advantage of machine learning.

Machine learning deals with algorithms that allow an AI to “learn” based on data gathered from previous experiences. Developers do not need to write code that dictates each and every action or intention for the AI. Instead, the system recognizes patterns from its experiences and assumes the appropriate action based on that data. It is akin to the process of trial and error.

A key question often asked of machine learning systems in the research and testing environment is, “Why did the system make this prediction?” About this search for intention, Liang explains:

“If you’re crossing the road and a car comes toward you, you have a model of what the other human driver is going to do. But if the car is controlled by an AI, how should humans know how to behave?”

It is important to see that a system is performing well, but perhaps even more important is its ability to explain in easily understandable terms why it acted the way it did. Even if the system is not accurate, it must be explainable and predictable. For AI to be safely deployed, systems must rely on well-understood, realistic, and testable assumptions.

Current theories that explore the idea of reliable AI focus on fitting the observable outputs in the training data. However, as Liang explains, this could lead “to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs.”

Running multiple tests is important, of course. These types of simulations, explains Liang, “are good for debugging techniques — they allow us to more easily perform controlled experiments, and they allow for faster iteration.”

However, to really know whether a technique is effective, “there is no substitute for applying it to real life,” says Liang, “ this goes for language, vision, and robotics.” An autonomous vehicle may perform well in all testing conditions, but there is no way to accurately predict how it could perform in an unpredictable natural disaster.

 

Interpretable ML Systems

The best-performing models in many domains — e.g., deep neural networks for image and speech recognition — are obviously quite complex. These are considered “blackbox models,” and their predictions can be difficult, if not impossible, for them to explain.

Liang and his team are working to interpret these models by researching how a particular training situation leads to a prediction. As Liang explains, “Machine learning algorithms take training data and produce a model, which is used to predict on new inputs.”

This type of observation becomes increasingly important as AIs take on more complex tasks – think life or death situations, such as interpreting medical diagnoses. “If the training data has outliers or adversarially generated data,” says Liang, “this will affect (corrupt) the model, which will in turn cause predictions on new inputs to be possibly wrong.  Influence functions allow you to track precisely the way that a single training point would affect the prediction on a particular new input.”

Essentially, by understanding why a model makes the decisions it makes, Liang’s team hopes to improve how models function, discover new science, and provide end users with explanations of actions that impact them.

Another aspect of Liang’s research is ensuring that an AI understands, and is able to communicate, its limits to humans. The conventional metric for success, he explains, is average accuracy, “which is not a good interface for AI safety.” He posits, “what is one to do with an 80 percent reliable system?”

Liang is not looking for the system to have an accurate answer 100 percent of the time. Instead, he wants the system to be able to admit when it does not know an answer. If a user asks a system “How many painkillers should I take?” it is better for the system to say, “I don’t know” rather than making a costly or dangerous incorrect prediction.

Liang’s team is working on this challenge by tracking a model’s predictions through its learning algorithm — all the way back to the training data where the model parameters originated.

Liang’s team hopes that this approach — of looking at the model through the lens of the training data — will become a standard part of the toolkit of developing, understanding, and diagnosing machine learning. He explains that researchers could relate this to many applications: medical, computer, natural language understanding systems, and various business analytics applications.

“I think,” Liang concludes, “there is some confusion about the role of simulations some eschew it entirely and some are happy doing everything in simulation. Perhaps we need to change culturally to have a place for both.

In this way, Liang and his team plan to lay a framework for a new generation of machine learning algorithms that work reliably, fail gracefully, and reduce risks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

Explainable AI: a discussion with Dan Weld

Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.

When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.

Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.

According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”

As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.

 

Teaching Machines to Explain Themselves

Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”

When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.

Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.

ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.

Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.

One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.

While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.

Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.

“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.    

 

Managing Unpredictable Behavior

Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”

Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.

“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”

Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.

 

Implications for Current AI Systems

The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.

Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.

And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.

“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How to Design AIs That Understand What Humans Want: An Interview with Long Ouyang

As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.

With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.

If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literally satisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.

Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”

It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.

 

Pragmatic Reasoning: Truthful vs. Helpful

As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.

To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”

Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”

Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.

This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.

To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.

In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.

But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.

As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.

By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.

The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Towards a Code of Ethics in Artificial Intelligence with Paula Boddington

AI promises a smarter world – a world where finance algorithms analyze data better than humans, self-driving cars save millions of lives from accidents, and medical robots eradicate disease. But machines aren’t perfect. Whether an automated trading agent buys the wrong stock, a self-driving car hits a pedestrian, or a medical robot misses a cancerous tumor – machines will make mistakes that severely impact human lives.

Paula Boddington, a philosopher based in the Department of Computer Science at Oxford, argues that AI’s power for good and bad makes it crucial that researchers consider the ethical importance of their work at every turn. To encourage this, she is taking steps to lay the groundwork for a code of AI research ethics.

Codes of ethics serve a role in any field that impacts human lives, such as in medicine or engineering. Tech organizations like the Institute for Electronics and Electrical Engineers (IEEE) and the Association for Computing Machinery (ACM) also adhere to codes of ethics to keep technology beneficial, but no concrete ethical framework exists to guide all researchers involved in AI’s development. By codifying AI research ethics, Boddington suggests, researchers can more clearly frame AI’s development within society’s broader quest of improving human wellbeing.

To better understand AI ethics, Boddington has considered various areas including autonomous trading agents in finance, self-driving cars, and biomedical technology. In all three areas, machines are not only capable of causing serious harm, but they assume responsibilities once reserved for humans. As such, they raise fundamental ethical questions.

“Ethics is about how we relate to human beings, how we relate to the world, how we even understand what it is to live a human life or what our end goals of life are,” Boddington says. “AI is raising all of those questions. It’s almost impossible to say what AI ethics is about in general because there are so many applications. But one key issue is what happens when AI replaces or supplements human agency, a question which goes to the heart of our understandings of ethics.”

 

The Black Box Problem

Because AI systems will assume responsibility from humans – and for humans – it’s important that people understand how these systems might fail. However, this doesn’t always happen in practice.

Consider the Northpointe algorithm that US courts used to predict reoffending criminals. The algorithm weighed 100 factors such as prior arrests, family life, drug use, age and sex, and predicted the likelihood that a defendant would commit another crime. Northpointe’s developers did not specifically consider race, but when investigative journalists from ProPublica analyzed Northpointe, it found that the algorithm incorrectly labeled black defendants as “high risks” almost twice as often as white defendants. Unaware of this bias and eager to improve their criminal justice system, states like Wisconsin, Florida, and New York trusted the algorithm for years to determine sentences. Without understanding the tools they were using, these courts incarcerated defendants based on flawed calculations.

The Northpointe case offers a preview of the potential dangers of deploying AI systems that people don’t fully understand. Current machine-learning systems operate so quickly that no one really knows how they make decisions – not even the people who develop them. Moreover, these systems learn from their environment and update their behavior, making it more difficult for researchers to control and understand the decision-making process. This lack of transparency – the “black box” problem – makes it extremely difficult to construct and enforce a code of ethics.

Codes of ethics are effective in medicine and engineering because professionals understand and have control over their tools, Boddington suggests. There may be some blind spots – doctors don’t know everything about the medicine they prescribe – but we generally accept this “balance of risk.”

“It’s still assumed that there’s a reasonable level of control,” she explains. “In engineering buildings there’s no leeway to say, ‘Oh I didn’t know that was going to fall down.’ You’re just not allowed to get away with that. You have to be able to work it out mathematically. Codes of professional ethics rest on the basic idea that professionals have an adequate level of control over their goods and services.”

But AI makes this difficult. Because of the “black box” problem, if an AI system sets a dangerous criminal free or recommends the wrong treatment to a patient, researchers can legitimately argue that they couldn’t anticipate that mistake.

“If you can’t guarantee that you can control it, at least you could have as much transparency as possible in terms of telling people how much you know and how much you don’t know and what the risks are,” Boddington suggests. “Ethics concerns how we justify ourselves to others. So transparency is a key ethical virtue.”

 

Developing a Code of Ethics

Despite the “black box” problem, Boddington believes that scientific and medical communities can inform AI research ethics. She explains: “One thing that’s really helped in medicine and pharmaceuticals is having citizen and community groups keeping a really close eye on it. And in medicine there are quite a few “maverick” or “outlier” doctors who question, for instance, what the end value of medicine is. That’s one of the things you need to develop codes of ethics in a robust and responsible way.”

A code of AI research ethics will also require many perspectives. “I think what we really need is diversity in terms of thinking styles, personality styles, and political backgrounds, because the tech world and the academic world both tend to be fairly homogeneous,” Boddington explains.

Not only will diverse perspectives account for different values, but they also might solve problems better, according to research from economist Lu Hong and political scientist Scott Page. Hong and Page found that if you compare two groups solving a problem – one homogeneous group of people with very high IQs, and one diverse group of people with lower IQs – the diverse group will probably solve the problem better.

 

Laying the Groundwork

This fall, Boddington will release the main output of her project: a book titled Towards a Code of Ethics for Artificial Intelligence. She readily admits that the book can’t cover every ethical dilemma in AI, but it should help demonstrate how tricky it is to develop codes of ethics for AI and spur more discussion on issues like how codes of professional ethics can deal with the “black box” problem.

Boddington has also collaborated with the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, which recently released a report exhorting researchers to look beyond the technical capabilities of AI, and “prioritize the increase of human wellbeing as our metric for progress in the algorithmic age.”

Although a formal code is only part of what’s needed for the development of ethical AI, Boddington hopes that this discussion will eventually produce a code of AI research ethics. With a robust code, researchers will be better equipped to guide artificial intelligence in a beneficial direction.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Aligning Superintelligence With Human Interests

The trait that currently gives humans a dominant advantage over other species is intelligence. Human advantages in reasoning and resourcefulness have allowed us to thrive. However, this may not always be the case.

Although superintelligent AI systems may be decades away, Benya Fallenstein – a research fellow at the Machine Intelligence Research Institute – believes “it is prudent to begin investigations into this technology now.” The more time scientists and researchers have to prepare for a system that could eventually be smarter than us, the better.

A smarter-than-human AI system could potentially develop the tools necessary to exert control over humans. At the same time, highly capable AI systems may not possess a human sense of fairness, compassion, or conservatism. Consequently, the AI system’s single-minded pursuit of its programmed goals could cause it to deceive programmers, attempt to seize resources, or otherwise exhibit adversarial behaviors.

Fallenstein believes researchers must “ensure that AI would behave in ways that are reliably aligned with human interests.” However, even highly-reliable agent programming does not guarantee a positive impact; the effects of the system still depend upon whether it is pursuing human-approved goals. A superintelligent system may find clever, unintended ways to achieve the specific goals that it is given.

For example, imagine a super intelligent system designed to cure cancer “without doing anything bad.” This goal is rooted in cultural context and shared human knowledge. The AI may not completely understand what qualifies as “bad.” Therefore, it may try to cure cancer by stealing resources, proliferating robotic laboratories at the expense of the biosphere, kidnapping test subjects, or all of the above.

If a current AI system gets out of hand, researchers simply shut it down and modify its source code. However, modifying super-intelligent systems could prove to be more difficult, if not impossible. A system could acquire new hardware, alter its software, or take other actions that would leave the original programmers with only dubious control over the agent. And since most programmed goals are better achieved if the system stays operational and continues pursuing its goals than if it is deactivated or its goals are changed, systems will naturally tend to have an incentive to resist shutdown and to resist modifications to their goals.

Fallenstein explains that, in order to ensure that the development of super-intelligent AI has a positive impact on the world, “it must be constructed in such a way that it is amenable to correction, even if it has the ability to prevent or avoid correction.” The goal is not to design systems that fail in their attempts to deceive the programmers; the goal is to understand how highly intelligent and general-purpose reasoners with flawed goals can be built to have no incentives to deceive programmers in the first place. Instead, the intent is for the first highly capable systems to be “corrigible”—i.e., for them to recognize that their goals and other features are works in progress, and to work with programmers to identify and fix errors.

Little is known about the design or implementation details of such systems because everything, at this point, is hypothetical — no super-intelligent AI systems exist yet. As a consequence, the research described below focuses on formal agent foundations for AI alignment research — that is, on developing the basic conceptual tools and theories that are most likely to be useful for engineering robustly beneficial systems in the future.

Active research into this is focused on small “toy” problems and models of corrigible agents, in the hope that insight gained there could be applied to more realistic and complex versions of the problems. Fallenstein and her team sought to illuminate the key difficulties of AI using these models. One such toy problem is the “shutdown problem,” which involves designing a set of preferences that incentivize an agent to shut down upon the press of a button without also incentivizing the agent to either cause or prevent the pressing of that button. This would tell researchers whether a utility function could be specified such that agents using that function switch their preferences on demand, without having incentives to cause or prevent the switching.

Studying models in this formal logical setting has led to partial solutions, and further research that drives the development of methods for reasoning under logical uncertainty may continue.

The largest result thus far under this research program is “logical induction,” a line of research led by Scott Garrabrant. It functions as a new model of deductively-limited reasoning.

The kind of uncertainty we have about mathematical questions that are too difficult for us to settle one way or another right this moment is logical uncertainty. For example, a typical human mind can’t quickly answer the question:

What’s the 10100th digit of Pi?

Further, nobody has the computational resources to solve this in a reasonable amount of time. Despite this, mathematicians have lots of theories about how likely mathematical conjectures are to be true. As such, they must be implicitly using some sort of criterion that can be used to judge the probability that a mathematical statement is true or not. This type of “logical induction” proves that a computable logical inductor (an algorithm producing probability assignments that satisfy logical induction) exists.

The research team presented a computable algorithm that outpaces deduction, assigning high subjective probabilities to provable conjectures and low probabilities to disprovable conjectures long before the proofs can be produced. Among other accomplishments, the algorithm learns to reason competently about its own beliefs and trust its future beliefs while avoiding paradox. This gives some formal backing to the thought that real-world probabilistic agents can often be reasonably confident in their future reasoning in practice.

The team believes “there’s a good chance that this framework will open up new avenues of study in questions of metamathematics, decision theory, game theory, and computational reflection that have long seemed intractable.” They are also “cautiously optimistic” that they’ll improve our understanding of decision theory and counterfactual reasoning, and other problems related to AI value alignment.

At the same time, Fallenstein’s team doesn’t believe that all parts of the problem must be solved in advance. In fact, “the task of designing smarter, safer, more reliable systems could be delegated to early smarter-than-human systems.” This can only happen, though, as long as the research done by the AI can be trusted.

According to Fallenstein, this “call to arms” is vital, and “significant effort must be focused on the study of superintelligence alignment as soon as possible.” It is important to develop a formal understanding of AI alignment well in advance of making design decisions about smarter-than-human systems. By beginning the work early, humans inevitably face the risk that it may turn out to be irrelevant. However, failing to prepare could be even worse.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Using History to Chart the Future of AI: An Interview with Katja Grace

The million-dollar question in AI circles is: When? When will artificial intelligence become so smart and capable that it surpasses human beings at every task?

AI is already visible in the world through job automation, algorithmic financial trading, self-driving cars and household assistants like Alexa, but these developments are trivial compared to the idea of artificial general intelligence (AGI) – AIs that can perform a broad range of intellectual tasks just as humans can. Many computer scientists expect AGI at some point, but hardly anyone agrees on when it will be developed.

Given the unprecedented potential of AGI to create a positive or destructive future for society, many worry that humanity cannot afford to be surprised by its arrival. A surprise is not inevitable, however, and Katja Grace believes that if researchers can better understand the speed and consequences of advances in AI, society can prepare for a more beneficial outcome.

 

AI Impacts

Grace, a researcher for the Machine Intelligence Research Institute (MIRI), argues that, while we can’t chart the exact course of AI’s improvement, it is not completely unpredictable. Her project AI Impacts is dedicated to identifying and conducting cost-effective research projects that can shed light on when and how AI will impact society in the coming years. She aims to “help improve estimates of the social returns to AI investment, identify neglected research areas, improve policy, or productively channel public interest in AI.”

AI Impacts asks such questions as: How rapidly will AI develop? How much advanced notice should we expect to have of disruptive change? What are the likely economic impacts of human-level AI? Which paths to AI should be considered plausible or likely? Can we say anything meaningful about the impact of contemporary choices on long-term outcomes?

One way to get an idea of these timelines is to ask the experts. In AI Impacts’ 2015 survey of 352 AI researchers, these researchers predicted a 50 percent chance that AI will outcompete humans in almost everything by 2060. However the experts also answered a very similar question with a date seventy-five years later, and gave a huge range of answers individually, making it difficult to rule anything out. Grace hopes her research with AI Impacts will inform and improve these estimates.

 

Learning from History

Some thinkers believe that AI could progress rapidly, without much warning. This is based on the observation that algorithms don’t need factories, and so could in principle progress at the speed of a lucky train of thought.

However, Grace argues that while we have not developed human-level AI before, our vast experience developing other technologies can tell us a lot about what will happen with AI. Studying the timelines of other technologies can inform the AI timeline.

In one of her research projects, Grace studies jumps in technological progress throughout history, measuring these jumps in terms of how many years of progress happen in one ‘go’. “We’re interested in cases where more than a decade of progress happens in one go,” she explains. “The case of nuclear weapons is really the only case we could find that was substantially more than 100 years of progress in one go.”

For example, physicists began to consider nuclear energy in 1939, and by 1945 the US successfully tested a nuclear weapon. As Grace writes, “Relative effectiveness [of explosives] doubled less than twice in the 1100 years prior to nuclear weapons, then it doubled more than eleven times when the first nuclear weapons appeared. If we conservatively model previous progress as exponential, this is around 6000 years of progress in one step [compared to] previous rates.”

Grace also considered the history of high-temperature superconductors. Since the discovery of superconductors in 1911, peak temperatures for superconduction rose slowly, growing from 4K (Kelvin) initially to about 30K in the 1980s. Then in 1986, scientists discovered a new class of ceramics that increased the maximum temperature to 130K in just seven years. “That was close to 100 years of progress in one go,” she explains.

Nuclear weapons and superconductors are rare cases – most of the technologies that Grace has studied either don’t demonstrate discontinuity, or only show about 10-30 years of progress in one go. “The main implication of what we have done is that big jumps are fairly rare, so that should not be the default expectation,” Grace explains.

Furthermore, AI’s progress largely depends on how fast hardware and software improve, and those are processes we can observe now. For instance, if hardware progress starts to slow from its long run exponential progress, we should expect AI later.

Grace is currently investigating these unknowns about hardware. She wants to know “how fast the price of hardware is decreasing at the moment, how much hardware helps with AI progress relative to e.g. algorithmic improvements, and how custom hardware matters.”

 

Intelligence Explosion

AI researchers and developers must also be prepared for the possibility of an intelligence explosion – the idea that strong AI will improve its intelligence faster than humans could possibly understand or control.

Grace explains: “The thought is that once the AI becomes good enough, the AI will do its own AI research (instead of humans), and then we’ll have AI doing AI research where the AI research makes the AI smarter and then the AI can do even better AI research. So it will spin out of control.”

But she suggests that this feedback loop isn’t entirely unpredictable. “We already have intelligent [people] doing AI research that leads to better capabilities,” Grace explains. “We don’t have a perfect idea of what those things will be like when the AI is as intelligent as humans or as good at AI research, but we have some evidence about it from other places and we shouldn’t just be saying the spinning out of control could happen at any speed. We can get some clues about it now. We can say something about how many extra IQ points of AI you get for a year of research or effort, for example.”

AI Impacts is an ongoing project, and Grace hopes her research will find its way into conversations about intelligence explosions and other aspects of AI. With better-informed timeline estimates, perhaps policymakers and philanthropists can more effectively ensure that advanced AI doesn’t catch humanity by surprise.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the Future of Work: An Interview With Moshe Vardi

“The future of work is now,” says Moshe Vardi. “The impact of technology on labor has become clearer and clearer by the day.”

Machines have already automated millions of routine, working-class jobs in manufacturing. And now, AI is learning to automate non-routine jobs in transportation and logistics, legal writing, financial services, administrative support, and healthcare.

Vardi, a computer science professor at Rice University, recognizes this trend and argues that AI poses a unique threat to human labor.

 

Initiating a Policy Response

From the Luddite movement to the rise of the Internet, people have worried that advancing technology would destroy jobs. Yet despite painful adjustment periods during these changes, new jobs replaced old ones, and most workers found employment. But humans have never competed with machines that can outperform them in almost anything. AI threatens to do this, and many economists worry that society won’t be able to adapt.

“What people are now realizing is that this formula that technology destroys jobs and creates jobs, even if it’s basically true, it’s too simplistic,” Vardi explains.

The relationship between technology and labor is more complex: Will technology create enough jobs to replace those it destroys? Will it create them fast enough? And for workers whose skills are no longer needed – how will they keep up?

To address these questions and consider policy responses, Vardi will hold a summit in Washington, D.C. on December 12, 2017. The summit will address six current issues within technology and labor: education and training, community impact, job polarization, contingent labor, shared prosperity, and economic concentration.

Education and training

A 2013 computerization study found that 47% of American workers held jobs at high risk of automation in the next decade or two. If this happens, technology must create roughly 100 million jobs.

As the labor market changes, schools must teach students skills for future jobs, while at-risk workers need accessible training for new opportunities. Truck drivers won’t transition easily to website design and coding jobs without proper training, for example. Vardi expects that adapting to and training for new jobs will become more challenging as AI automates a greater variety of tasks. 

Community impact

Manufacturing jobs are concentrated in specific regions where employers keep local economies afloat. Over the last thirty years, the loss of 8 million manufacturing jobs has crippled Rust Belt regions in the U.S. – both economically and culturally.

Today, the fifteen million jobs that involve operating a vehicle are concentrated in certain regions as well. Drivers occupy up to 9% of jobs in the Bronx and Queens districts of New York City, up to 7% of jobs in select Southern California and Southern Texas districts, and over 4% in Wyoming and Idaho. Automation could quickly assume the majority of these jobs, devastating the communities that rely on them.

Job polarization

“One in five working class men between ages 25 to 54 without college education are not working,” Vardi explains. “Typically, when we see these numbers, we hear about some country in some horrible economic crisis like Greece. This is really what’s happening in working class America.”

Employment is currently growing in high-income cognitive jobs and low-income service jobs, such as elderly assistance and fast-food service, which computers cannot automate yet. But technology is hollowing out the economy by automating middle-skill, working-class jobs first.

Many manufacturing jobs pay $25 per hour with benefits, but these jobs aren’t easy to come by. Since 2000, when millions of these jobs disappeared, displaced workers have either left the labor force or accepted service jobs that often pay $12 per hour, without benefits.

Truck driving, the most common job in over half of US states, may see a similar fate.

Source: IPUMS-CPS/ University of Minnesota Credit: Quoctrung Bui/NPR

 

Contingent labor

Increasingly, communications technology allows firms to save money by hiring freelancers and independent contractors instead of permanent workers. This has created the Gig Economy – a labor market characterized by short-term contracts and flexible hours at the cost of unstable jobs with fewer benefits. By some estimates, in 2016, one in three workers were employed in the Gig Economy, but not all by choice. Policymakers must ensure that this new labor market supports its workers.

Shared prosperity

Automation has decoupled job creation from economic growth, allowing the economy to grow while employment and income shrink, thus increasing inequality. Vardi worries that AI will accelerate these trends. He argues that policies encouraging economic growth must also support economic mobility for the middle class.

Economic concentration

Technology creates a “winner-takes-all” environment, where second best can hardly survive. Bing search is quite similar to Google search, but Google is much more popular than Bing. And do Facebook or Amazon have any legitimate competitors?

Startups and smaller companies struggle to compete with these giants because of data. Having more users allows companies to collect more data, which machine-learning systems then analyze to help companies improve. Vardi thinks that this feedback loop will give big companies long-term market power.

Moreover, Vardi argues that these companies create relatively few jobs. In 1990, Detroit’s three largest companies were valued at $65 billion with 1.2 million workers. In 2016, Silicon Valley’s three largest companies were valued at $1.5 trillion but with only 190,000 workers.

 

Work and society

Vardi primarily studies current job automation, but he also worries that AI could eventually leave most humans unemployed. He explains, “The hope is that we’ll continue to create jobs for the vast majority of people. But if the situation arises that this is less and less the case, then we need to rethink: how do we make sure that everybody can make a living?”

Vardi also anticipates that high unemployment could lead to violence or even uprisings. He refers to Andrew McAfee’s closing statement at the 2017 Asilomar AI Conference, where McAfee said, “If the current trends continue, the people will rise up before the machines do.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Self-Driving Cars Use Probability

Even though human drivers don’t consciously think in terms of probabilities, we observe our environment and make decisions based on the likelihood of certain things happening. A driver doesn’t calculate the probability that the sports car behind her will pass her, but through observing the car’s behavior and considering similar situations in the past, she makes her best guess.

We trust probabilities because it is the only way to take action in the midst of uncertainty.

Autonomous systems such as self-driving cars will make similar decisions based on probabilities, but through a different process. Unlike a human who trusts intuition and experience, these autonomous cars calculate the probability of certain scenarios using data collectors and reasoning algorithms.

 

How to Determine Probability

Stefano Ermon, a computer scientist at Stanford University, wants to make self-driving cars and autonomous systems safer and more reliable by improving the way they reason probabilistically about their environment. He explains, “The challenge is that you have to take actions and you don’t know what will happen next. Probabilistic reasoning is just the idea of thinking about the world in terms of probabilities, assuming that there is uncertainty.”

There are two main components to achieve safety. First, the computer model must collect accurate data, and second, the reasoning system must be able to draw the right conclusions from the model’s data.

Ermon explains, “You need both: to build a reliable model you need a lot of data, and then you need to be able to draw the right conclusions based on the model, and that requires the artificial intelligence to think about these models accurately. Even if the model is right, but you don’t have a good way to reason about it, you can do catastrophic things.”

For example, in the context of autonomous vehicles, models use various sensors to observe the environment and collect data about countless variables, such as the behavior of the drivers around you, potholes and other obstacles in front of you, weather conditions—every possible data point.

A reasoning system then interprets this data. It uses the model’s information to decide whether the driver behind you is dangerously aggressive, if the pothole ahead will puncture your tire, if the rain is obstructing visibility, and the system continuously changes the car’s behavior to respond to these variables.

Consider the aggressive driver behind you. As Ermon explains, “Somehow you need to be able to reason about these models. You need to come up with a probability. You don’t know what the car’s going to do but you can estimate, and based on previous behavior you can say this car is likely to cut the line because it has been driving aggressively.”

 

Improving Probabilistic Reasoning

Ermon is creating strong algorithms that can synthesize all of the data that a model produces and make reliable decisions.

As models improve, they collect more information and capture more variables relevant to making these decisions. But as Ermon notes, “the more complicated the model is, the more variables you have, the more complicated it becomes to make the optimal decisions based on the model.”

Thus as the data collection expands, the analysis must also improve. The artificial intelligence in these cars must be able to reason with this increasingly complex data.

And this reasoning can easily go wrong. “You need to be very precise when computing these probabilities,” Ermon explains. “If the probability that a car cuts into your lane is 0.1, but you completely underestimate it and say it’s 0.01, you might end up making a fatal decision.”

To avoid fatal decisions, the artificial intelligence must be robust, but the data must also be complete. If the model collects incomplete data, “you have no guarantee that the number that you get when you run this algorithm has anything to do with the actual probability of that event,” Ermon explains.

The model and the algorithm entirely depend on each other to produce the optimal decision. If the model is incomplete and fails to capture the black ice in front of you, no reasoning system will be able to make a safe decision. And even if the model captures the black ice and every other possible variable, if the reasoning system cannot handle the complexity of this data, again the car will fail.

 

How Safe Will Autonomous Systems Be?

The technology in self-driving cars has made huge leaps lately, and Ermon is hopeful. “Eventually, as computers get better and algorithms get better and the models get better, hopefully we’ll be able to prevent all accidents,” he suggests.

However, there are still fundamental limitations on probabilistic reasoning. “Most computer scientists believe that it is impossible to come up with the silver bullet for this problem, an optimal algorithm that is so powerful that it can reason about all sorts of models that you can think about,” Ermon explains. “That’s the key barrier.”

But despite this barrier, self-driving cars will soon be available for consumers. Ford, for one, has promised to put its self-driving cars on the road by 2021. And while most computer scientists expect these cars to be far safer than human drivers, their success depends on their ability to reason probabilistically about their environment.

As Ermon explains, “You need to be able to estimate these kinds of probabilities because they are the building blocks that you need to make decisions.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?

 

Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code.

In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system?

 

Understanding Autonomous Trading Agents

Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment.

With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.”

In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work.

 

Market Manipulation

Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets.

Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at $30 in one market and $32 in a second market, an agent can buy the $30 stock and immediately sell it for $32 in the second market, making a $2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.

 

Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”

 

Preparing for Advanced AI

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.

 

Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.

 

Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”

 

Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.