AI Safety Research

Fuxin Li

Assistant Professor

School of Electrical Engineering and Computer Science, Oregon State University

Project: Understanding when a deep network is going to be wrong

Amount Recommended:    $121,642

Project Summary

Deep learning architectures have fundamentally changed the capabilities of machine learning and benefited many applications such as computer vision, speech recognition, natural language processing, with many more influences to other problems coming along. However, very little is understood about those networks. Months of manual tuning is required for obtaining excellent performance, and the trained networks are often not robust: recent studies have shown that the error rate increases significantly with just slight pixel-level perturbations in image that are not even perceivable by human eyes.

In this proposal, the PI propose to thoroughly study the optimization and robustness of deep convolutional networks in visual object recognition, in order to gain more understanding about deep learning. This includes training procedures that will make deep learning more automatic and lead to less failures in training, as well as confidence estimates when the deep network is utilized to predict on new data. The confidence estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Technical Abstract

This work will focus on predicting whether a deep convolutional neural network (CNN) has succeeded. This includes two aspects, first, to find an explanation of why and when can the stochastic optimization in a deep CNN succeed without overfitting and obtain high accuracy. Second, to establish an estimate of confidence of the predictions of the deep learning architecture. Those estimates of confidence can be used as safeguards when utilizing those networks in real life. In order to establish those estimates, this work proposes to start from intuitions drawn from empirical analyses from the training procedure and model structures of deep learning. In-depth analyses will be completed for the mini-batch training procedure and model structures, by illustrating the differences each mini-batch size provides for the training, as well as the low-dimensional manifold structure in the classification. From those analyses, this work will result in approaches to design and control a proper training procedure with less human intervention, as well as confidence estimates by estimating the distance of the testing data to the sub-manifold that the trained network is effective on.

Ongoing Projects

  • In the first year, Fuxin and his team mainly worked on the estimation of a confidence metric of a deep learning prediction. Different from previous perspectives that focus on improving the classifiers to correctly classify the adversarial examples, they focus on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The rationale of this approach is that the space of potential adversarial examples is far larger than those of normal ones, and sampling from that distribution is going to require an exponential amount of samples, thus, conventional approaches are unlikely to be safe enough for it. Under a general self­ aware learning framework that this team defined, they are able to use a cascade of SVM classifiers from the early convolutional layers to estimate whether or not images come from an adversarial distribution or a normal distribution. A cascade classifier increases the difficulty for the adversarial to counteract (hack through) it because it needs to counteract multiple levels of classifiers before being able to cheat through the cascade classifier. The accuracy measured as area-­under­-curve rate of detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%. Once an image has been found out as corrupted, one could seek approaches to recover the original images. These researchers have found out that for the type of optimal corruption we have tested, a simple 3×3 average filter will almost be able to recover most of the hacked images.
  • In the second year, Fuxin and his team first plan to finish the work in convolutional neural networks, where they will conduct experiments with new adversarial generation mechanisms and other networks. They will seek applications of the proposed framework. One example would be that such a confidence estimate could potentially help avoid the Tesla crash scenario that happened in May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. They plan to conduct some experiments on that in the next few months.
  • Soon, Fuxin and his team will start generalizing the approach to other domains such as temporal models (RNNs,LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-­making paradigms.