AI Safety Research

Fuxin Li

Assistant Professor

School of Electrical Engineering and Computer Science, Oregon State University

lif@eecs.oregonstate.edu

Project: Understanding when a deep network is going to be wrong

Amount Recommended:    $121,642

Project Summary

Deep learning architectures have fundamentally changed the capabilities of machine learning and benefited many applications such as computer vision, speech recognition, natural language processing, with many more influences to other problems coming along. However, very little is understood about those networks. Months of manual tuning is required for obtaining excellent performance, and the trained networks are often not robust: recent studies have shown that the error rate increases significantly with just slight pixel-level perturbations in image that are not even perceivable by human eyes.

In this proposal, the PI propose to thoroughly study the optimization and robustness of deep convolutional networks in visual object recognition, in order to gain more understanding about deep learning. This includes training procedures that will make deep learning more automatic and lead to less failures in training, as well as confidence estimates when the deep network is utilized to predict on new data. The confidence estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Technical Abstract

This work will focus on predicting whether a deep convolutional neural network (CNN) has succeeded. This includes two aspects, first, to find an explanation of why and when can the stochastic optimization in a deep CNN succeed without overfitting and obtain high accuracy. Second, to establish an estimate of confidence of the predictions of the deep learning architecture. Those estimates of confidence can be used as safeguards when utilizing those networks in real life. In order to establish those estimates, this work proposes to start from intuitions drawn from empirical analyses from the training procedure and model structures of deep learning. In-depth analyses will be completed for the mini-batch training procedure and model structures, by illustrating the differences each mini-batch size provides for the training, as well as the low-dimensional manifold structure in the classification. From those analyses, this work will result in approaches to design and control a proper training procedure with less human intervention, as well as confidence estimates by estimating the distance of the testing data to the sub-manifold that the trained network is effective on.

Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Ongoing Projects

  • In the first year, Fuxin and his team mainly worked on the estimation of a confidence metric of a deep learning prediction. Different from previous perspectives that focus on improving the classifiers to correctly classify the adversarial examples, they focus on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The rationale of this approach is that the space of potential adversarial examples is far larger than those of normal ones, and sampling from that distribution is going to require an exponential amount of samples, thus, conventional approaches are unlikely to be safe enough for it. Under a general self­ aware learning framework that this team defined, they are able to use a cascade of SVM classifiers from the early convolutional layers to estimate whether or not images come from an adversarial distribution or a normal distribution. A cascade classifier increases the difficulty for the adversarial to counteract (hack through) it because it needs to counteract multiple levels of classifiers before being able to cheat through the cascade classifier. The accuracy measured as area-­under­-curve rate of detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%. Once an image has been found out as corrupted, one could seek approaches to recover the original images. These researchers have found out that for the type of optimal corruption we have tested, a simple 3×3 average filter will almost be able to recover most of the hacked images.
  • In the second year, Fuxin and his team first plan to finish the work in convolutional neural networks, where they will conduct experiments with new adversarial generation mechanisms and other networks. They will seek applications of the proposed framework. One example would be that such a confidence estimate could potentially help avoid the Tesla crash scenario that happened in May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. They plan to conduct some experiments on that in the next few months.
  • Soon, Fuxin and his team will start generalizing the approach to other domains such as temporal models (RNNs,LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-­making paradigms.