Artificial Intelligence and the King Midas Problem

Contents
Value alignment. Itâs a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?
Very simple robots with very constrained tasks do not need goals or values at all. Although the Roombaâs designers know you want a clean floor, Roomba doesnât: it simply executes a procedure that the Roombaâs designers predict will workâmost of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesnât understand that the sentimental value of the cat exceeds its nutritional value.
AI and King Midas
Stuart Russell, a renowned AI researcher, compares the challenge of defining a robotâs objective to the King Midas myth. âThe robot,â says Russell, âhas some objective and pursues it brilliantly to the destruction of mankind. And itâs because itâs the wrong objective. Itâs the old King Midas problem.â
This is one of the big problems in AI safety that Russell is trying to solve. âWeâve got to get the right objective,â he explains, âand since we donât seem to know how to program it, the right answer seems to be that the robot should learn â from interacting with and watching humans â what it is humans care about.â
Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (âgameâ as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).
A CIRL game includes a person and a robot: the robotâs only purpose is to make the person happy, but it doesnât know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the humanâs morning routine, it should discover how important coffee isânot to itself, of course (we donât want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.
What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?
âWe donât have to know what we want. Thatâs the whole point. Itâs hard to know what we want, and what we ask for is evidence for what we really want, but itâs not ground truth,â says Russell.
The robot should learn what people want by watching their actions. Then, if someone asks for something thatâs technically incorrect, the robot will understand their underlying wish. Russell adds, âThe robotâif it had been thereâwould have told Midas that he didnât really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.â
AI Off-Switch
Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AIâs off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesnât harm humans â if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?
Russell and his team propose solving this by providing a level of uncertainty in the robotâs objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.
They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robotâs uncertainty about the humanâs true objective.
Ensuring AI Safety
In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.
He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists werenât worried about the existential threat of AI. Russell and his co-author summed up why itâs better to be cautious and careful than just assume all will turn out for the best:
âOur experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the worldâs most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but âmoonshine.â Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. ⌠he risk arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.â
This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we donât make the same mistakes as King Midas.
âLook,â he says, âif you were King Midas, would you want your robot to say, âEverything turns to gold? OK, boss, you got it.â No! Youâd want it to say, âAre you sure? Including your food, drink, and relatives? Iâm pretty sure you wouldnât like that. How about this: you point to something and say âAbracadabra Aurificioâ or something, and then Iâll turn it to gold, OK?ââ
This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.
About the Future of Life Institute
The Future of Life Institute (FLI) is a global think tank with a team of 20+ full-time staff operating across the US and Europe. FLI has been working to steer the development of transformative technologies towards benefitting life and away from extreme large-scale risks since its founding in 2014. Find out more about our mission or explore our work.
Related content
Other posts about AI, AI Research, Grants Program, Partner Orgs, Recent News

The U.S. Public Wants Regulation (or Prohibition) of Expert‑Level and Superhuman AI



