AI Safety Research

Thomas G. Dietterich

Distinguished Professor (Emeritus) and Director of Intelligent Systems

School of Electrical Engineering and Computer Science, Oregon State University

tgd@cs.orst.edu

Project: Robust and Transparent Artificial Intelligence Via Anomaly Detection and Explanation

Amount Recommended:    $200,000

Project Summary

In the early days of AI research, scientists studied problems such as chess and theorem proving that involved “micro worlds” that were perfectly known and predictable. Since the 1980s, AI researchers have studied problems involving uncertainty. They apply probability theory to model uncertainty about the world and use decision theory to represent the utility of the possible outcomes of proposed actions. This allows computers to make decisions that maximize expected utility by taking into account the “known unknowns”. However, when such AI systems are deployed in the real world, they can easily be confused by “unknown unknowns” and make poor decisions. This project will develop theoretical principles and AI algorithms for learning and acting safely in the presence of unknown unknowns. The algorithms will be able to detect and respond to unexpected changes in the world. They will ensure that when the AI system plans a sequence of actions, it takes into account its ignorance of the unknown unknowns. This will lead it to behave cautiously and turn to humans for help. Instead of maximizing expected utility, it will first ensure that its actions avoid unsafe outcomes and only then maximize utility. This will make AI systems much safer.

Technical Abstract

The  development of AI technology has progressed from working with “known knowns” — AI planning and  problem solving in deterministic, closed worlds — to working with “known unknowns” — planning and  learning in uncertain environments based on probabilistic models of those environments. A critical challenge for future AI systems is to behave safely and conservatively in open worlds, where most aspects of the environment are not modeled by the AI agent — the “unknown unknowns”.  Our team, with deep experience in machine learning, probabilistic modeling, and planning, will develop principles, evaluation methodologies, and algorithms for learning and acting safely in the presence of the unknown unknowns. For supervised learning, we will develop UU-conformal prediction algorithms that extend conformal prediction to incorporate nonconformity scores based on robust anomaly detection algorithms.  This will enable supervised learners to behave safely in the presence of novel classes and arbitrary changes in the input distribution. For reinforcement learning, we will develop UU-sensitive algorithms that act to minimize risk due to unknown unknowns. A key principle is that AI systems must broaden the set of variables that they consider to include as many variables as possible in order to detect anomalous data points and unknown side – effects of actions.

Making AI Safe in an Unpredictable World: An Interview with Thomas G. Dietterich

Our AI systems work remarkably well in closed worlds. That’s because these environments contain a set number of variables, making the worlds perfectly known and perfectly predictable. In these micro environments, machines only encounter objects that are familiar to them. As a result, they always know how they should act and respond. Unfortunately, these same systems quickly become confused when they are deployed in the real world, as many objects aren’t familiar to them. This is a bit of a problem because, when an AI system becomes confused, the results can be deadly.

Consider, for example, a self-driving car that encounters a novel object. Should it speed up, or should it slow down? Or consider an autonomous weapon system that sees an anomaly. Should it attack, or should it power down? Each of these examples involve life-and-death decisions, and they reveal why, if we are to deploy advanced AI systems in real world environments, we must be confident that they will behave correctly when they encounter unfamiliar objects.

Thomas G. Dietterich, Emeritus Professor of Computer Science at Oregon State University, explains that solving this identification problem begins with ensuring that our AI systems aren’t too confident — that they recognize when they encounter a foreign object and don’t misidentify it as something that they are acquainted with. To achieve this, Dietterich asserts that we must move away from (or, at least, greatly modify) the discriminative training methods that currently dominate AI research.

However, to do that, we must first address the “open category problem.”

Understanding the Open Category Problem

When driving down the road, we can encounter a near infinite number of anomalies. Perhaps a violent storm will arise, and hail will start to fall. Perhaps our vision will become impeded by smoke or excessive fog. Although these encounters may be unexpected, the human brain is able to easily analyze new information and decide on the appropriate course of action — we will recognize a newspaper drifting across the road and, instead of abruptly slamming on the breaks, continue on our way.

Because of the way that they are programmed, our computer systems aren’t able to do the same.

“The way we use machine learning to create AI systems and software these days generally uses something called ‘discriminative training,’” Dietterich explains, “which implicitly assumes that the world consists of only, say, a thousand different kinds of objects.” This means that, if a machine encounters a novel object, it will assume that it must be one of the thousand things that it was trained on. As a result, such systems misclassify all foreign objects.

This is the “open category problem” that Dietterich and his team are attempting to solve. Specifically, they are trying to ensure that our machines don’t assume that they have encountered every possible object, but are, instead, able to reliably detect — and ultimately respond to — new categories of alien objects.

Dietterich notes that, from a practical standpoint, this means creating an anomaly detection algorithm that assigns an anomaly score to each object detected by the AI system. That score must be compared against a set threshold and, if the anomaly score exceeds the threshold, the system will need to raise an alarm. Dietterich states that, in response to this alarm, the AI system should take a pre-determined safety action. For example, a self-driving car that detects an anomaly might slow down and pull off to the side of the road.

Creating a Theoretical Guarantee of Safety

There are two challenges to making this method work. First, Dietterich asserts that we need good anomaly detection algorithms. Previously, in order to determine what algorithms work well, the team compared the performance of eight state-of-the-art anomaly detection algorithms on a large collection of benchmark problems.

The second challenge is to set the alarm threshold so that the AI system is guaranteed to detect a desired fraction of the alien objects, such as 99%. Dietterich says that formulating a reliable setting for this threshold is one of the most challenging research problems because there are, potentially, infinite kinds of alien objects. “The problem is that we can’t have labeled training data for all of the aliens. If we had such data, we would simply train the discriminative classifier on that labeled data,” Dietterich says.

To circumvent this labeling issue, the team assumes that the discriminative classifier has access to a representative sample of “query objects” that reflect the larger statistical population. Such a sample could, for example, be obtained by collecting data from cars driving on highways around the world. This sample will include some fraction of unknown objects, and the remaining objects belong to known object categories.

Notably, the data in the sample is not labeled. Instead, the AI system is given an estimate of the fraction of aliens in the sample. And by combining the information in the sample with the labeled training data that was employed to train the discriminative classifier, the team’s new algorithm can choose a good alarm threshold. If the estimated fraction of aliens is known to be an over-estimate of the true fraction, then the chosen threshold is guaranteed to detect the target percentage of aliens (i.e. 99%).

Ultimately, the above is the first method that can give a theoretical guarantee of safety for detecting alien objects, and a paper reporting the results was presented at ICML 2018. “We are able to guarantee, with high probability, that we can find 99% all of these new objects,” Dietterich says.

In the next stage of their research, Dietterich and his team plan to begin testing their algorithm in a more complex setting. Thus far, they’ve been looking primarily at classification, where the system looks at an image and classifies it. Next, they plan to move to controlling an agent, like a robot of self-driving car. “At each point in time, in order to decide what action to choose, our system will do a ‘look ahead search’ based on a learned model of the behavior of the agent and its environment. If the look ahead arrives at a state that is rated as ‘alien’ by our method, then this indicates that the agent is about to enter a part of the state space where it is not competent to choose correct actions,” Dietterich says. In response, as previously mentioned, the agent should execute a series of safety actions and request human assistance.

But what does this safety action actually consist of?

Responding to Aliens

Dietterich notes that, once something is identified as an anomaly and the alarm is sounded, the nature of this fall back system will depend on the machine in question, like whether the AI system is in a self-driving car or autonomous weapon.

To explain how these secondary systems operate, Dietterich turns to self-driving cars. “In the Google car, if the computers lose power, then there’s a backup system that automatically slows the car down and pulls it over to the side of the road.” However, Dietterich clarifies that stopping isn’t always the best course of action. One may assume that a car should come to a halt if an unidentified object crosses its path; however, if the unidentified object happens to be a blanket of snow on a particularly icy day, hitting the breaks gets more complicated. The system would need to factor in the icy roads, any cars that may be driving behind, and whether these cars can break in time to avoid a rear end collision.

But if we can’t predict every eventuality, how can we expect to program an AI system so that it behaves correctly and in a way that is safe?

Unfortunately, there’s no easy answer; however, Dietterich clarifies that there are some general best practices; “There’s no universal solution to the safety problem, but obviously there are some actions that are safer than others. Generally speaking, removing energy from the system is a good idea,” he says. Ultimately, Dietterich asserts that all the work related to programming safe AI really boils down to determining how we want our machines to behave under specific scenarios, and he argues that we need to rearticulate how we characterize this problem, and focus on accounting for all the factors, if we are to develop a sound approach.

Dietterich notes that “when we look at these problems, they tend to get lumped under a classification of ‘ethical decision making,’ but what they really are is problems that are incredibly complex. They depend tremendously on the context in which they are operating, the human beings, the other innovations, the other automated systems, and so on. The challenge is correctly describing how we want the system to behave and then ensuring that our implementations actually comply with those requirements.” And he concludes, “the big risk in the future of AI is the same as the big risk in any software system, which is that we build the wrong system, and so it does the wrong thing. Arthur C Clark in 2001: A Space Odysseyhad it exactly right. The Hal 9000 didn’t ‘go rogue;’ it was just doing what it had been programmed to do.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Publications

  1. Siddiqui, A., et al. Finite Sample Complexity of Rare Pattern Anomaly Detection. Proceedings of UAI-2016 (pp. 10). 2016.

Workshops

  1. Embedded Machine Learning: November 12-14, 2015. AAAI Fall Symposium, Arlington, VA.
    • This workshop included issues of Unknown Unknowns in machine learning and more generally touched on issues at the intersection of software engineering and machine learning, including verification and validation.
  2. The Future of Artificial Intelligence: January 11-13, 2016. New York University, NY.
  3. Reliable Machine Learning in the Wild: June 23, 2016.
    • This was an ICML Workshop. This workshop discussed a wide range of issues related to engineering reliable AI systems. Among the questions discussed were (a) how to estimate causal effects under various kinds of situations (A/B tests, domain adaptation, observational medical data), (b) how to train classifiers to be robust in the face of adversarial attacks (on both training and test data), (c) how to train reinforcement learning systems with risk-sensitive objectives, especially when the model class may be misspecified and the observations are incomplete, and (d) how to guarantee that a learned policy for an MDP satisfies specified temporal logic properties. Several important engineering practices were also discussed, especially engaging a Red Team to perturb/poison data and making sure we are measuring the right data. Dietterich’s assessment is that a research community is coalescing nicely around these questions, and the quality of the work is excellent. More details of the workshop can be found at this website: https://sites.google.com/site/wildml2016/.
  4. Colloquium Series on Robust and Beneficial AI (CSRBAI): May 27-June 17, 2016.
  5. Issues Concerning AI Transparency: May 28-29, 2016
    • In many cases, it can be prohibitively difficult for humans to understand AI systems’ internal states and reasoning. This makes it more difficult to anticipate such systems’ behavior and correct errors. On the other hand, there have been striking advances in communicating the internals of some machine learning systems, and in formally verifying certain features of algorithms. These researchers would like to see how far they can push the transparency of AI systems while maintaining these systems’ capabilities.
    • Slides are up for Tom Dietterich’s overview talk at this workshop, “Issues Concerning AI Transparency” (https://intelligence.org/files/csrbai/dietterich-slides.pdf).

Presentations

  1. Dietterich, T. G. “Toward Beneficial Artificial Intelligence.” Blouin Creative Leadership Summit, NY, NY, September 21, 2015.
  2. Dietterich, T. G. “Artificial Intelligence: Progress and Challenges.” Technical and Business Perspectives on the Current and Future Impact of Machine Learning. Valencia, Spain, October 20, 2015. Press coverage in El Mundo.
  3. Dietterich, T. G. “Algorithms Among Us: The Societal Impacts of Machine Learning.” NIPS Symposium. Montreal, Canada. December 10, 2015.
  4. Dietterich, T. G. “AI in Science, Law Enforcement, and Sustainability.” The Future of Artificial Intelligence. NYU. January 11, 2016.
    • Dietterich also participated in a side meeting with Henry Kissinger on January 13 along with Max Tegmark and several other key people.
  5. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” AAAI Conference on Artificial Intelligence, Phoenix, AZ. February 14, 2016.
  6. Dietterich, T. G. “Testing, Verification & Validation, Monitoring.” Control and Responsible Innovation in the Development of Autonomous Machines. Hastings Center, Garrison, NY. April 25, 2016.
  7. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” Huawei STW Workshop, Shenzhen, China. May 17, 2016.
  8. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” Distinguished Seminar, National Key Laboratory for Novel Software Technology, University of Nanjing, Nanjing, China. May 19, 2016.
  9. Fern, A., Dietterich, T. G. “Toward Explainable Uncertainty.” MIRI Colloquium Series on Robust and Beneficial Artificial Intelligence. Berkeley, CA. May 27-29, 2016.
  10. Dietterich, T. G. “Understanding and Managing Ecosystems through Artificial Intelligence.” AI For Social Good. White House OSTP Workshop. Washington, DC. June 6-7, 2016.
  11. Dietterich, T. G., et al. “Anomaly Detection: Principles, Benchmarking, Explanation, and Theory.” ICML Workshop on Anomaly Detection Keynote Speech. NY. June, 24, 2016.
  12. Dietterich, T. G. “Making Artificial Intelligence Systems Robust.” Safe Artificial Intelligence. White House OSTP Workshop, Pittsburgh, PA. June 28, 2016.

Ongoing Projects/Recent Progress

  1. Research on how well conformal prediction works in the presence of unknown unknowns (specifically, unknown classes).
    • This team developed an “ambiguity” metric for measuring the success of conformal prediction in multiclass classification problems. An ideal classifier achieves 0% ambiguity by outputting a prediction set consisting of a single (correct) class label. The team applied this metric to measure performance of conformal prediction applied to random forests and deep neural networks. For the “known known” case (where all test data points belong to classes from the training data), the team replicated previous results showing the conformal prediction works very well and achieves its stated accuracy levels. It rarely abstains (i.e., outputs the empty set of class labels). The team then tested the performance of these algorithms when they are given novel classes as test queries. They found that as the amount of training data grows, the random forest algorithm becomes very confident that the examples from novel classes actually belong to one of the known classes—that is, the algorithm gives terrible results. The team observed a similar effect for a deep net trained on CIFAR 10 and tested on Nethack sprite images. These results strongly justify the need to develop new algorithms for unknown-unknown classification. The team is preparing a paper on this for AAAI 2017.
  2. Research on developing, Improving, and Understanding Algorithms for Anomaly Detection.
    • Under algorithm development, this team refined the Isolation Forest (iForest) algorithm in several ways. First, as proposed, they developed a variation of iForest that applies random rotations to the data in order to remove the axis-parallel bias of the original algorithm. Experiments show that this helps when there are high correlations between variables. Second, they addressed the question of how many trees should be in the forest. They developed optimal stopping rules that stop growing trees at the point where they can provide PAC guarantees for having found the kpoints with largest anomaly scores. The team also explored replacing the mean isolation depth with more expressive statistics such as the Anderson-Darling statistic (as described in the proposal). However, this did not improve performance, and it is much more difficult to analyze, so they have not pursued it further. The team developed various online versions of iForest, but they have not yet analyzed them to obtain a formal understanding. As proposed, the team measured learning curves for several anomaly detection algorithms. The performance of these algorithms rises surprisingly rapidly (i.e., only requires relatively small sample sizes in the range of 500-2000 points). Even more surprising is that performance begins to decline thereafter. The team has developed a PAC learning theory that accounts for the rapid learning (paper published at UAI 2016 with oral presentation), and they are actively studying the “decline phenomenon”.
  3. Unanticipated Work
    • This team decided to completely rebuild their anomaly detection benchmark data sets, evaluate a wider range of algorithms, and perform a more meaningful analysis. Partly this was in response to reviewers’ comments on a previous journal draft. But the team members were also unsatisfied with aspects of our previous analysis. The new benchmark data set provides a more meaningful baseline and excludes configurations where it is impossible to do better than random. The results confirm that iForest is the state of the art and that quantile methods (one-class SVM and Support Vector Data Description) are the worst. The remaining algorithms all give very similar performance. A revised journal manuscript will be submitted soon.
    • The biggest change was this team’s decision to re-do the anomaly detection benchmarks and analysis. A second change is that, at least as of now, they are not planning to refine the “anomaly detection via overfitting” paradigm for designing anomaly detection algorithms. The team’s assessment is that the iForest is fast enough and good enough for us to use as a basis for developing both Unknown-Unknown conformal prediction for supervised learning and for imitation learning, so that will be the main focus of their work in Year 2 if the grant is renewed.