Understanding when a deep network is going to be wrong
Deep learning architectures have fundamentally changed the capabilities of machine learning and benefited many applications such as computer vision, speech recognition, natural language processing, with many more influences to other problems coming along. However, very little is understood about those networks. Months of manual tuning is required for obtaining excellent performance, and the trained networks are often not robust: recent studies have shown that the error rate increases significantly with just slight pixel-level perturbations in image that are not even perceivable by human eyes.
In this proposal, The PI propose to thoroughly study the optimization and robustness of deep convolutional networks in visual object recognition, in order to gain more understanding about deep learning. This includes training procedures that will make deep learning more automatic and lead to less failures in training, as well as confidence estimates when the deep network is utilized to predict on new data. The confidence estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.
This work will focus on predicting whether a deep convolutional neural network (CNN) has succeeded. This includes two aspects, first, to find an explanation of why and when can the stochastic optimization in a deep CNN succeed without overfitting and obtain high accuracy. Second, to establish an estimate of confidence of the predictions of the deep learning architecture. Those estimates of confidence can be used as safeguards when utilizing those networks in real life. In order to establish those estimates, this work proposes to start from intuitions drawn from empirical analyses from the training procedure and model structures of deep learning. In-depth analyses will be completed for the mini-batch training procedure and model structures, by illustrating the differences each mini-batch size provides for the training, as well as the low-dimensional manifold structure in the classification. From those analyses, this work will result in approaches to design and control a proper training procedure with less human intervention, as well as confidence estimates by estimating the distance of the testing data to the submanifold that the trained network is effective on.