AI Safety Research
Machine learning systems are confusing – just ask any AI researcher. Their deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions. The human brain simply can’t keep up.
When people learn to play Go, instructors can challenge their decisions and hear their explanations. Through this interaction, teachers determine the limits of a student’s understanding. But DeepMind’s AlphaGo, which recently beat the world’s champions at Go, can’t answer these questions. When AlphaGo makes an unexpected decision it’s difficult to understand why it made that choice.
Admittedly, the stakes are low with AlphaGo: no one gets hurt if it makes an unexpected move and loses. But deploying intelligent machines that we can’t understand could set a dangerous precedent.
According to computer scientist Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, and it’s necessary today. He explains, “Since machine learning is at the core of pretty much every AI success story, it’s really important for us to be able to understand what it is that the machine learned.”
As machine learning (ML) systems assume greater control in healthcare, transportation, and finance, trusting their decisions becomes increasingly important. If researchers can program AIs to explain their decisions and answer questions, as Weld is trying to do, we can better assess whether they will operate safely on their own.
Teaching Machines to Explain Themselves
Weld has worked on techniques that expose blind spots in ML systems, or “unknown unknowns.”
When an ML system faces a “known unknown,” it recognizes its uncertainty with the situation. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct, but it will be wrong. Often, classifiers have this confidence because they were “trained on data that had some regularity in it that’s not reflected in the real world,” Weld says.
Consider an ML system that has been trained to classify images of dogs, but has only been trained on images of brown and black dogs. If this system sees a white dog for the first time, it might confidently assert that it’s not a dog. This is an “unknown unknown” – trained on incomplete data, the classifier has no idea that it’s completely wrong.
ML systems can be programmed to ask for human oversight on known unknowns, but since they don’t recognize unknown unknowns, they can’t easily ask for oversight. Weld’s research team is developing techniques to facilitate this, and he believes that it will complement explainability. “After finding unknown unknowns, the next thing the human probably wants is to know WHY the learner made those mistakes, and why it was so confident,” he explains.
Machines don’t “think” like humans do, but that doesn’t mean researchers can’t engineer them to explain their decisions.
One research group jointly trained a ML classifier to recognize images of birds and generate captions. If the AI recognizes a toucan, for example, the researchers can ask “why.” The neural net can then generate an explanation that the huge, colorful bill indicated a toucan.
While AI developers will prefer certain concepts explained graphically, consumers will need these interactions to involve natural language and more simplified explanations. “Any explanation is built on simplifying assumptions, but there’s a tricky judgment question about what simplifying assumptions are OK to make. Different audiences want different levels of detail,” says Weld.
Explaining the bird’s huge, colorful bill might suffice in image recognition tasks, but with medical diagnoses and financial trades, researchers and users will want more. Like a teacher-student relationship, human and machine should be able to discuss what the AI has learned and where it still needs work, drilling down on details when necessary.
“We want to find mistakes in their reasoning, understand why they’re making these mistakes, and then work towards correcting them,” Weld adds.
Managing Unpredictable Behavior
Yet, ML systems will inevitably surprise researchers. Weld explains, “The system can and will find some way of achieving its objective that’s different from what you thought.”
Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems control the stock market, power grids, or data privacy. To control this unpredictability, Weld wants to engineer AIs to get approval from humans before executing novel plans.
“It’s a judgment call,” he says. “If it has seen humans executing actions 1-3, then that’s a normal thing. On the other hand, if it comes up with some especially clever way of achieving the goal by executing this rarely-used action number 5, maybe it should run that one by a live human being.”
Over time, this process will create norms for AIs, as they learn which actions are safe and which actions need confirmation.
Implications for Current AI Systems
The people that use AI systems often misunderstand their limitations. The doctor using an AI to catch disease hasn’t trained the AI and can’t understand its machine learning. And the AI system, not programmed to explain its decisions, can’t communicate problems to the doctor.
Weld wants to see an AI system that interacts with a pre-trained ML system and learns how the pre-trained system might fail. This system could analyze the doctor’s new diagnostic software to find its blind spots, such as its unknown unknowns. Explainable AI software could then enable the AI to converse with the doctor, answering questions and clarifying uncertainties.
And the applications extend to finance algorithms, personal assistants, self-driving cars, and even predicting recidivism in the legal system, where explanation could help root out bias. ML systems are so complex that humans may never be able to understand them completely, but this back-and-forth dialogue is a crucial first step.
“I think it’s really about trust and how can we build more trustworthy AI systems,” Weld explains. “The more you interact with something, the more shared experience you have, the more you can talk about what’s going on. I think all those things rightfully build trust.”
This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.