AI Safety Research

Thomas G. Dietterich

Distinguished Professor (Emeritus) and Director of Intelligent Systems

School of Electrical Engineering and Computer Science, Oregon State University

Project: Robust and Transparent Artificial Intelligence Via Anomaly Detection and Explanation

Amount Recommended:    $200,000

Project Summary

In the early days of AI research, scientists studied problems such as chess and theorem proving that involved “micro worlds” that were perfectly known and predictable. Since the 1980s, AI researchers have studied problems involving uncertainty. They apply probability theory to model uncertainty about the world and use decision theory to represent the utility of the possible outcomes of proposed actions. This allows computers to make decisions that maximize expected utility by taking into account the “known unknowns”. However, when such AI systems are deployed in the real world, they can easily be confused by “unknown unknowns” and make poor decisions. This project will develop theoretical principles and AI algorithms for learning and acting safely in the presence of unknown unknowns. The algorithms will be able to detect and respond to unexpected changes in the world. They will ensure that when the AI system plans a sequence of actions, it takes into account its ignorance of the unknown unknowns. This will lead it to behave cautiously and turn to humans for help. Instead of maximizing expected utility, it will first ensure that its actions avoid unsafe outcomes and only then maximize utility. This will make AI systems much safer.

Technical Abstract

The  development of AI technology has progressed from working with “known knowns” — AI planning and  problem solving in deterministic, closed worlds — to working with “known unknowns” — planning and  learning in uncertain environments based on probabilistic models of those environments. A critical challenge for future AI systems is to behave safely and conservatively in open worlds, where most aspects of the environment are not modeled by the AI agent — the “unknown unknowns”.  Our team, with deep experience in machine learning, probabilistic modeling, and planning, will develop principles, evaluation methodologies, and algorithms for learning and acting safely in the presence of the unknown unknowns. For supervised learning, we will develop UU-conformal prediction algorithms that extend conformal prediction to incorporate nonconformity scores based on robust anomaly detection algorithms.  This will enable supervised learners to behave safely in the presence of novel classes and arbitrary changes in the input distribution. For reinforcement learning, we will develop UU-sensitive algorithms that act to minimize risk due to unknown unknowns. A key principle is that AI systems must broaden the set of variables that they consider to include as many variables as possible in order to detect anomalous data points and unknown side – effects of actions.


  1. Siddiqui, A., et al. Finite Sample Complexity of Rare Pattern Anomaly Detection. Proceedings of UAI-2016 (pp. 10). 2016.


  1. Embedded Machine Learning: November 12-14, 2015. AAAI Fall Symposium, Arlington, VA.
    • This workshop included issues of Unknown Unknowns in machine learning and more generally touched on issues at the intersection of software engineering and machine learning, including verification and validation.
  2. The Future of Artificial Intelligence: January 11-13, 2016. New York University, NY.
  3. Reliable Machine Learning in the Wild: June 23, 2016.
    • This was an ICML Workshop. This workshop discussed a wide range of issues related to engineering reliable AI systems. Among the questions discussed were (a) how to estimate causal effects under various kinds of situations (A/B tests, domain adaptation, observational medical data), (b) how to train classifiers to be robust in the face of adversarial attacks (on both training and test data), (c) how to train reinforcement learning systems with risk-sensitive objectives, especially when the model class may be misspecified and the observations are incomplete, and (d) how to guarantee that a learned policy for an MDP satisfies specified temporal logic properties. Several important engineering practices were also discussed, especially engaging a Red Team to perturb/poison data and making sure we are measuring the right data. Dietterich’s assessment is that a research community is coalescing nicely around these questions, and the quality of the work is excellent. More details of the workshop can be found at this website:
  4. Colloquium Series on Robust and Beneficial AI (CSRBAI): May 27-June 17, 2016.
  5. Issues Concerning AI Transparency: May 28-29, 2016
    • In many cases, it can be prohibitively difficult for humans to understand AI systems’ internal states and reasoning. This makes it more difficult to anticipate such systems’ behavior and correct errors. On the other hand, there have been striking advances in communicating the internals of some machine learning systems, and in formally verifying certain features of algorithms. These researchers would like to see how far they can push the transparency of AI systems while maintaining these systems’ capabilities.
    • Slides are up for Tom Dietterich’s overview talk at this workshop, “Issues Concerning AI Transparency” (


  1. Dietterich, T. G. “Toward Beneficial Artificial Intelligence.” Blouin Creative Leadership Summit, NY, NY, September 21, 2015.
  2. Dietterich, T. G. “Artificial Intelligence: Progress and Challenges.” Technical and Business Perspectives on the Current and Future Impact of Machine Learning. Valencia, Spain, October 20, 2015. Press coverage in El Mundo.
  3. Dietterich, T. G. “Algorithms Among Us: The Societal Impacts of Machine Learning.” NIPS Symposium. Montreal, Canada. December 10, 2015.
  4. Dietterich, T. G. “AI in Science, Law Enforcement, and Sustainability.” The Future of Artificial Intelligence. NYU. January 11, 2016.
    • Dietterich also participated in a side meeting with Henry Kissinger on January 13 along with Max Tegmark and several other key people.
  5. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” AAAI Conference on Artificial Intelligence, Phoenix, AZ. February 14, 2016.
  6. Dietterich, T. G. “Testing, Verification & Validation, Monitoring.” Control and Responsible Innovation in the Development of Autonomous Machines. Hastings Center, Garrison, NY. April 25, 2016.
  7. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” Huawei STW Workshop, Shenzhen, China. May 17, 2016.
  8. Dietterich, T. G. “Steps Toward Robust Artificial Intelligence.” Distinguished Seminar, National Key Laboratory for Novel Software Technology, University of Nanjing, Nanjing, China. May 19, 2016.
  9. Fern, A., Dietterich, T. G. “Toward Explainable Uncertainty.” MIRI Colloquium Series on Robust and Beneficial Artificial Intelligence. Berkeley, CA. May 27-29, 2016.
  10. Dietterich, T. G. “Understanding and Managing Ecosystems through Artificial Intelligence.” AI For Social Good. White House OSTP Workshop. Washington, DC. June 6-7, 2016.
  11. Dietterich, T. G., et al. “Anomaly Detection: Principles, Benchmarking, Explanation, and Theory.” ICML Workshop on Anomaly Detection Keynote Speech. NY. June, 24, 2016.
  12. Dietterich, T. G. “Making Artificial Intelligence Systems Robust.” Safe Artificial Intelligence. White House OSTP Workshop, Pittsburgh, PA. June 28, 2016.

Ongoing Projects/Recent Progress

  1. Research on how well conformal prediction works in the presence of unknown unknowns (specifically, unknown classes).
    • This team developed an “ambiguity” metric for measuring the success of conformal prediction in multiclass classification problems. An ideal classifier achieves 0% ambiguity by outputting a prediction set consisting of a single (correct) class label. The team applied this metric to measure performance of conformal prediction applied to random forests and deep neural networks. For the “known known” case (where all test data points belong to classes from the training data), the team replicated previous results showing the conformal prediction works very well and achieves its stated accuracy levels. It rarely abstains (i.e., outputs the empty set of class labels). The team then tested the performance of these algorithms when they are given novel classes as test queries. They found that as the amount of training data grows, the random forest algorithm becomes very confident that the examples from novel classes actually belong to one of the known classes—that is, the algorithm gives terrible results. The team observed a similar effect for a deep net trained on CIFAR 10 and tested on Nethack sprite images. These results strongly justify the need to develop new algorithms for unknown-unknown classification. The team is preparing a paper on this for AAAI 2017.
  2. Research on developing, Improving, and Understanding Algorithms for Anomaly Detection.
    • Under algorithm development, this team refined the Isolation Forest (iForest) algorithm in several ways. First, as proposed, they developed a variation of iForest that applies random rotations to the data in order to remove the axis-parallel bias of the original algorithm. Experiments show that this helps when there are high correlations between variables. Second, they addressed the question of how many trees should be in the forest. They developed optimal stopping rules that stop growing trees at the point where they can provide PAC guarantees for having found the kpoints with largest anomaly scores. The team also explored replacing the mean isolation depth with more expressive statistics such as the Anderson-Darling statistic (as described in the proposal). However, this did not improve performance, and it is much more difficult to analyze, so they have not pursued it further. The team developed various online versions of iForest, but they have not yet analyzed them to obtain a formal understanding. As proposed, the team measured learning curves for several anomaly detection algorithms. The performance of these algorithms rises surprisingly rapidly (i.e., only requires relatively small sample sizes in the range of 500-2000 points). Even more surprising is that performance begins to decline thereafter. The team has developed a PAC learning theory that accounts for the rapid learning (paper published at UAI 2016 with oral presentation), and they are actively studying the “decline phenomenon”.
  3. Unanticipated Work
    • This team decided to completely rebuild their anomaly detection benchmark data sets, evaluate a wider range of algorithms, and perform a more meaningful analysis. Partly this was in response to reviewers’ comments on a previous journal draft. But the team members were also unsatisfied with aspects of our previous analysis. The new benchmark data set provides a more meaningful baseline and excludes configurations where it is impossible to do better than random. The results confirm that iForest is the state of the art and that quantile methods (one-class SVM and Support Vector Data Description) are the worst. The remaining algorithms all give very similar performance. A revised journal manuscript will be submitted soon.
    • The biggest change was this team’s decision to re-do the anomaly detection benchmarks and analysis. A second change is that, at least as of now, they are not planning to refine the “anomaly detection via overfitting” paradigm for designing anomaly detection algorithms. The team’s assessment is that the iForest is fast enough and good enough for us to use as a basis for developing both Unknown-Unknown conformal prediction for supervised learning and for imitation learning, so that will be the main focus of their work in Year 2 if the grant is renewed.