AI Safety Research
As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.
With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.
If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literallysatisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.
Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”
It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.
Pragmatic Reasoning: Truthful vs. Helpful
As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.
To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”
Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”
Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.
This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.
To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.
In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.
But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.
As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.
By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.
The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.
This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.