AI Safety Research

Benjamin Rubinstein

Senior Lecturer, Dept. Computing & Information Systems

The University of Melbourne, Australia

Project: Security Evaluation of Machine Learning Systems

Amount Recommended:    $98,532

Project Summary

Machine Learning and Artificial Intelligence underpin technologies that we rely on daily, from consumer electronics (smart phones), medical implants (continuous blood glucose monitors), websites (Facebook, Google), to the systems that defend critical infrastructure. The very characteristic that makes these systems so beneficial — adaptability — can also be exploited by sophisticated adversaries wishing to breach system security or gain an economic advantage. This project will develop usable software tools for evaluating vulnerabilities in learning systems, a first step towards general-purpose, secure machine learning.

Technical Abstract

This project aims to develop systems for the analysis of machine learning algorithms in adversarial environments. Today Machine Learning and Statistics are employed in many technologies where participants have an incentive to game the system, for example internet ad placement, cybersecurity, credit risk in finance, health analytics, and smart utility grids. However little is known about how well state-of-the-art inference techniques fare when data is manipulated by a malicious adversary. By formulating the process of evading a learned model, or manipulating training data to poison learning, as an optimization program, our approach to evaluating security reduces to one a projected subgradient descent. Our main method for solving such iterative optimizations generically, will be to employ the dynamic code analysis represented by automatic differentiation. A key output of this project will be usable software tools for evaluating the security of learning systems in general.


Cybersecurity and Machine Learning

When it comes to cybersecurity, no nation can afford to slack off. If a nation’s defense systems cannot anticipate how an attacker will try to fool them, then an especially clever attack could expose military secrets or use disguised malware to cause major networks to crash.

A nation’s defense systems must keep up with the constant threat of attack, but this is a difficult and never-ending process. It seems that the defense is always playing catch-up.

Ben Rubinstein, a professor at the University of Melbourne in Australia, asks: “Wouldn’t it be good if we knew what the malware writers are going to do next, and to know what type of malware is likely to get through the filters?”

In other words, what if defense systems could learn to anticipate how attackers will try to fool them?

Adversarial Machine Learning

In order to address this question, Rubinstein studies how to prepare machine-learning systems to catch adversarial attacks. In the game of national cybersecurity, these adversaries are often individual hackers or governments who want to trick machine-learning systems for profit or political gain.

Nations have become increasingly dependent on machine-learning systems to protect against such adversaries. Unaided by humans, machine-learning systems in anti-malware and facial recognition software have the ability to learn and improve their function as they encounter new data. As they learn, they become better at catching adversarial attacks.

Machine-learning systems are generally good at catching adversaries, but they are not completely immune to threats, and adversaries are constantly looking for new ways to fool them. Rubinstein says, “Machine learning works well if you give it data like it’s seen before, but if you give it data that it’s never seen before, there’s no guarantee that it’s going to work.”

With adversarial machine learning, security agencies address this weakness by presenting the system with different types of malicious data to test the system’s filters. The system then digests this new information and learns how to identify and capture malware from clever attackers.

Security Evaluation of Machine-Learning Systems

Rubinstein’s project is called “Security Evaluation of Machine-Learning Systems”, and his ultimate goal is to develop a software tool that companies and government agencies can use to test their defenses. Any company or agency that uses machine-learning systems could run his software against their system. Rubinstein’s tool would attack and try to fool the system in order to expose the system’s vulnerabilities. In doing so, his tool anticipates how an attacker could slip by the system’s defenses.

The software would evaluate existing machine-learning systems and find weak spots that adversaries might try to exploit – similar to how one might defend a castle.

“We’re not giving you a new castle,” Rubinstein says, “we’re just going to walk around the perimeter and look for holes in the walls and weak parts of the castle, or see where the moat is too shallow.”

By analyzing many different machine-learning systems, his software program will pick up on trends and be able to advise security agencies to either use a different system or bolster the security of their existing system. In this sense, his program acts as a consultant for every machine-learning system.

Consider a program that does facial recognition. This program would use machine learning to identify faces and catch adversaries that pretend to look like someone else.

Rubinstein explains: “Our software aims to automate this security evaluation so that it takes an image of a person and a program that does facial recognition, and it will tell you how to change its appearance so that it will evade detection or change the outcome of machine learning in some way.”

This is called a mimicry attack – when an adversary makes one instance (one face) look like another, and thereby fools a system.

To make this example easier to visualize, Rubinstein’s group built a program that demonstrates how to change a face’s appearance to fool a machine-learning system into thinking that it is another face.

In the image below, the two faces don’t look alike, but the left image has been modified so that the machine-learning system thinks it is the same as the image on the right. This example provides insight into how adversaries can fool machine-learning systems by exploiting quirks.


When Rubinstein’s software fools a system with a mimicry attack, security personnel can then take that information and retrain their program to establish more effective security when the stakes are higher.

Minimizing the Attacker’s Advantage

While Rubinstein’s software will help to secure machine-learning systems against adversarial attacks, he has no illusions about the natural advantages that attackers enjoy. It will always be easier to attack a castle than to defend it, and the same holds true for a machine-learning system. This is called the ‘asymmetry of cyberwarfare.’

“The attacker can come in from any angle. It only needs to succeed at one point, but the defender needs to succeed at all points,” says Rubinstein.

In general, Rubinstein worries that the tools available to test machine-learning systems are theoretical in nature, and put too much responsibility on the security personnel to understand the complex math involved. A researcher might redo the mathematical analysis for every new learning system, but security personnel are unlikely to have the time or resources to keep up.

Rubinstein aims to “bring what’s out there in theory and make it more applied and more practical and easy for anyone who’s using machine learning in a system to evaluate the security of their system.”

With his software, Rubinstein intends to help level the playing field between attackers and defenders. By giving security agencies better tools to test and adapt their machine-learning systems, he hopes to improve the ability of security personnel to anticipate and guard against cyberattacks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.