AI Safety Research

Percy Liang

Assistant Professor of Computer Science and, by courtesy, of Statistics

Stanford University

Project: Predictable AI via Failure Detection and Robustness

Amount Recommended:    $255,160

Project Summary

In order for AI to be safely deployed, the desired behavior of the AI system needs to be based on well-understood, realistic, and empirically testable assumptions. From the perspective of modern machine learning, there are three main barriers to this goal. First, existing theory and algorithms mainly focus on fitting the observable outputs in the training data, which could lead, for instance, to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs. Second, existing methods are designed to handle a single specified set of testing conditions, and thus little can be said about how a system will behave in a fundamentally new setting; e.g., an autonomous driving system that performs well in most conditions may still perform arbitrarily poorly during natural disasters. Finally, most systems have no way of detecting whether their underlying assumptions have been violated: they will happily continue to predict and act even on inputs that are completely outside the scope of the system.

In this proposal, we detail a research program for addressing all three of the problems above. Just as statistical learning theory (e.g., the work of Vapnik) laid down the foundations of existing machine learning and AI techniques, allowing the field to flourish over the last 25 years, we aim to lay the groundwork for a new generation of safe-by-design AI systems, which can sustain the continued deployment of AI in society

Technical Abstract

With the pervasive deployment of machine learning algorithms in mission-critical AI systems,  it is  imperative  to  ensure  that  these  algorithms  behave  predictably  in  the  wild.   Current  machine learning  algorithms  rely  on  a  tacit  assumption  that  training  and  test  conditions  are  similar,  an assumption that is often violated due to changes in user preferences, blacking out of sensors, etc. Worse, these failures are often silent and difficult to diagnose.

We propose to develop a new generation of machine learning algorithms that come with strong static and dynamic guarantees necessary for safe deployment in open-domain settings. Our proposal focuses on three key thrusts: robustness to context change, inferring the underlying process from partial supervision, and failure detection at execution time. First, rather than learning models that predict accurately on a target distribution, we will use minimax optimization to learn models that are suitable for any target distribution within a “safe” family.  Second, while existing learning algorithms can fit the input-output behavior from one domain, they often fail to learn the underlying reason for making certain predictions; we address this with moment-based algorithms for learning latent-variable models, with a novel focus on structural properties and global guarantees. Finally, we propose using dynamic testing to detect when the assumptions underlying either of these methods fail, and trigger a reasonable fallback.  With these three points, we aim to lay down a framework for machine learning algorithms that work reliably and fail gracefully.


  1. Khani, F., et al. Unanimous prediction for 100% precision with application to learning semantic mappings. Association for Computational Linguistics (ACL), 2016.
    • This paper relates to the problem of training a system so that it is guaranteed to either predict correctly on a new input or abstain. In some sense, the system knows what it doesn’t know. These researchers performed this research in the context of semantic parsing, the problem of mapping natural language utterances to logical forms, and they showed that it is indeed possible to make this guarantee of 100% precision, under modeling assumptions. Empirically, this works on the standard US Geography question answering dataset.
  2. Steinhardt and Liang. Unsupervised Risk Estimation with only Conditional Independence Structure. Neural Information Processing Systems (NIPS), 2016.


  1. The Future of Artificial Intelligence: January 11-13, 2016. New York University, NY.
  2. Reliable Machine Learning in the Wild (ICML Workshop):  June 23, 2016. NY.
    • This workshop discussed a wide range of issues related to engineering reliable AI systems. Among the questions discussed were (a) how to estimate causal effects under various kinds of situations (A/B tests, domain adaptation, observational medical data), (b) how to train classifiers to be robust in the face of adversarial attacks (on both training and test data), (c) how to train reinforcement learning systems with risk-sensitive objectives, especially when the model class may be misspecified and the observations are incomplete, and (d) how to guarantee that a learned policy for an MDP satisfies specified temporal logic properties. Several important engineering practices were also discussed, especially engaging a Red Team to perturb/poison data and making sure we are measuring the right data. Liang’s assessment is that a research community is coalescing nicely around these questions, and the quality of the work is excellent.
    • More details of the workshop can be found at this website:
  3. Workshop on Human Interpretability in Machine Learning: June 23, 2016. ICML, New York, NY.
    • Liang gave an invited talk at this workshop. He presented two papers: “Unanimous prediction for 100% precision with application to learning semantic mappings,” and “Unanimous prediction for 100% precision with application to learning semantic mappings.”