How to Design AIs That Understand What Humans Want: An Interview with Long Ouyang

As artificial intelligence becomes more advanced, programmers will expect to talk to computers like they talk to humans. Instead of typing out long, complex code, we’ll communicate with AI systems using natural language.

With a current model called “program synthesis,” humans can get computers to write code for them by giving them examples and demonstrations of concepts, but this model is limited. With program synthesis, computers are literalists: instead of reading between the lines and considering intentions, they just do what’s literally true, and what’s literally true isn’t always what humans want.

If you asked a computer for a word starting with the letter “a,” for example, it might just return “a.” The word “a” literally satisfies the requirements of your question, but it’s not what you wanted. Similarly, if you asked an AI system “Can you pass the salt?” the AI might just remain still and respond, “Yes.” This behavior, while literally consistent with the requirements, is ultimately invalid because the AI didn’t pass you the salt.

Computer scientist Stuart Russell gives an example of a robot vacuum cleaner that someone instructs to “pick up as much dirt as possible.” Programmed to interpret this literally and not to consider intentions, the vacuum cleaner might find a single patch of dirt, pick it up, put it back down, and then repeatedly pick it up and put it back down – efficiently maximizing the vertical displacement of dirt, which it considers “picking up as much dirt as possible.”

It’s not hard to imagine situations in which this tendency for computers to interpret statements literally and rigidly can become extremely unsafe.

 

Pragmatic Reasoning: Truthful vs. Helpful

As AI systems assume greater responsibility in finance, military operations, and resource allocation, we cannot afford to have them bankrupt a city, bomb an ally country, or neglect an impoverished region because they interpret commands too literally.

To address this communication failure, Long Ouyang is working to “humanize” programming in order to prevent people from accidentally causing harm because they said something imprecise or mistaken to a computer. He explains: “As AI continues to develop, we’ll see more advanced AI systems that receive instructions from human operators – it will be important that these systems understand what the operators mean, as opposed to merely what they say.”

Ouyang has been working on improving program synthesis through studying pragmatic reasoning – the process of thinking about what someone did say as well as what he or she didn’t say. Humans do this analysis constantly when interpreting the meaning behind someone’s words. By reading between the lines, people learn what someone intends and what is helpful to them, instead of what is literally “true.”

Suppose a student asked a professor if she liked his paper, and the professor said she liked “some parts” of it. Most likely, the student would assume that the professor didn’t like other parts of his paper. After all, if the professor liked all of the paper, she would’ve said so.

This pragmatic reasoning is common sense for humans, but program synthesis won’t make the connection. In conversation, the word “some” clearly means “not all,” but in mathematical logic, “some” just means “any amount more than zero.” Thus for the computer, which only understands things in a mathematically logical sense, the fact that the professor liked some parts of the paper doesn’t rule out the possibility that she liked all parts.

To better understand how AI systems can learn to reason pragmatically and avoid these misinterpretations, Ouyang is studying how people interpret language and instructions from other people.

In one test, Ouyang gives a subject three data points – A, AAA, and AAAAA – and the subject has to work backwards to determine the rule for the sequence – i.e. what the experimenter is trying to convey with the examples. In this case, a human subject might quickly determine that all data points have an odd number of As, and so the rule is that the data points must have an odd number of As.

But there’s more to this process of determining the probability of certain rules. Cognitive scientists model our thinking process in these situations as Bayesian inference – a method of combining new evidence with prior beliefs to determine whether a hypothesis (or rule) is true.

As literal synthesizers, computers can only do a limited version of Bayesian inference. They consider how consistent the examples are with hypothesized rules, but they don’t consider how representative the examples are of the hypothesized rules. Specifically, literal synthesizers can only reason about the examples that weren’t presented in limited ways. Given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with the examples, but it fails to represent or capture what the experimenter had in mind. Human subjects, conversely, understand that the experimenter purposely omitted the even-numbered examples AA and AAAA, and determine the rule accordingly.

By studying how humans use Bayesian inference, Ouyang is working to improve computers’ ability to recognize that the information it receives – such as the statement “I liked some parts of your paper” or the command “pick up as much dirt as possible” – was purposefully selected to convey something beyond the literal meaning. His goal is to produce a concrete tool – a pragmatic synthesizer – that people can use to more effectively communicate with computers.

The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Towards a Code of Ethics in Artificial Intelligence with Paula Boddington

AI promises a smarter world – a world where finance algorithms analyze data better than humans, self-driving cars save millions of lives from accidents, and medical robots eradicate disease. But machines aren’t perfect. Whether an automated trading agent buys the wrong stock, a self-driving car hits a pedestrian, or a medical robot misses a cancerous tumor – machines will make mistakes that severely impact human lives.

Paula Boddington, a philosopher based in the Department of Computer Science at Oxford, argues that AI’s power for good and bad makes it crucial that researchers consider the ethical importance of their work at every turn. To encourage this, she is taking steps to lay the groundwork for a code of AI research ethics.

Codes of ethics serve a role in any field that impacts human lives, such as in medicine or engineering. Tech organizations like the Institute for Electronics and Electrical Engineers (IEEE) and the Association for Computing Machinery (ACM) also adhere to codes of ethics to keep technology beneficial, but no concrete ethical framework exists to guide all researchers involved in AI’s development. By codifying AI research ethics, Boddington suggests, researchers can more clearly frame AI’s development within society’s broader quest of improving human wellbeing.

To better understand AI ethics, Boddington has considered various areas including autonomous trading agents in finance, self-driving cars, and biomedical technology. In all three areas, machines are not only capable of causing serious harm, but they assume responsibilities once reserved for humans. As such, they raise fundamental ethical questions.

“Ethics is about how we relate to human beings, how we relate to the world, how we even understand what it is to live a human life or what our end goals of life are,” Boddington says. “AI is raising all of those questions. It’s almost impossible to say what AI ethics is about in general because there are so many applications. But one key issue is what happens when AI replaces or supplements human agency, a question which goes to the heart of our understandings of ethics.”

 

The Black Box Problem

Because AI systems will assume responsibility from humans – and for humans – it’s important that people understand how these systems might fail. However, this doesn’t always happen in practice.

Consider the Northpointe algorithm that US courts used to predict reoffending criminals. The algorithm weighed 100 factors such as prior arrests, family life, drug use, age and sex, and predicted the likelihood that a defendant would commit another crime. Northpointe’s developers did not specifically consider race, but when investigative journalists from ProPublica analyzed Northpointe, it found that the algorithm incorrectly labeled black defendants as “high risks” almost twice as often as white defendants. Unaware of this bias and eager to improve their criminal justice system, states like Wisconsin, Florida, and New York trusted the algorithm for years to determine sentences. Without understanding the tools they were using, these courts incarcerated defendants based on flawed calculations.

The Northpointe case offers a preview of the potential dangers of deploying AI systems that people don’t fully understand. Current machine-learning systems operate so quickly that no one really knows how they make decisions – not even the people who develop them. Moreover, these systems learn from their environment and update their behavior, making it more difficult for researchers to control and understand the decision-making process. This lack of transparency – the “black box” problem – makes it extremely difficult to construct and enforce a code of ethics.

Codes of ethics are effective in medicine and engineering because professionals understand and have control over their tools, Boddington suggests. There may be some blind spots – doctors don’t know everything about the medicine they prescribe – but we generally accept this “balance of risk.”

“It’s still assumed that there’s a reasonable level of control,” she explains. “In engineering buildings there’s no leeway to say, ‘Oh I didn’t know that was going to fall down.’ You’re just not allowed to get away with that. You have to be able to work it out mathematically. Codes of professional ethics rest on the basic idea that professionals have an adequate level of control over their goods and services.”

But AI makes this difficult. Because of the “black box” problem, if an AI system sets a dangerous criminal free or recommends the wrong treatment to a patient, researchers can legitimately argue that they couldn’t anticipate that mistake.

“If you can’t guarantee that you can control it, at least you could have as much transparency as possible in terms of telling people how much you know and how much you don’t know and what the risks are,” Boddington suggests. “Ethics concerns how we justify ourselves to others. So transparency is a key ethical virtue.”

 

Developing a Code of Ethics

Despite the “black box” problem, Boddington believes that scientific and medical communities can inform AI research ethics. She explains: “One thing that’s really helped in medicine and pharmaceuticals is having citizen and community groups keeping a really close eye on it. And in medicine there are quite a few “maverick” or “outlier” doctors who question, for instance, what the end value of medicine is. That’s one of the things you need to develop codes of ethics in a robust and responsible way.”

A code of AI research ethics will also require many perspectives. “I think what we really need is diversity in terms of thinking styles, personality styles, and political backgrounds, because the tech world and the academic world both tend to be fairly homogeneous,” Boddington explains.

Not only will diverse perspectives account for different values, but they also might solve problems better, according to research from economist Lu Hong and political scientist Scott Page. Hong and Page found that if you compare two groups solving a problem – one homogeneous group of people with very high IQs, and one diverse group of people with lower IQs – the diverse group will probably solve the problem better.

 

Laying the Groundwork

This fall, Boddington will release the main output of her project: a book titled Towards a Code of Ethics for Artificial Intelligence. She readily admits that the book can’t cover every ethical dilemma in AI, but it should help demonstrate how tricky it is to develop codes of ethics for AI and spur more discussion on issues like how codes of professional ethics can deal with the “black box” problem.

Boddington has also collaborated with the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, which recently released a report exhorting researchers to look beyond the technical capabilities of AI, and “prioritize the increase of human wellbeing as our metric for progress in the algorithmic age.”

Although a formal code is only part of what’s needed for the development of ethical AI, Boddington hopes that this discussion will eventually produce a code of AI research ethics. With a robust code, researchers will be better equipped to guide artificial intelligence in a beneficial direction.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Aligning Superintelligence With Human Interests

The trait that currently gives humans a dominant advantage over other species is intelligence. Human advantages in reasoning and resourcefulness have allowed us to thrive. However, this may not always be the case.

Although superintelligent AI systems may be decades away, Benya Fallenstein – a research fellow at the Machine Intelligence Research Institute – believes “it is prudent to begin investigations into this technology now.” The more time scientists and researchers have to prepare for a system that could eventually be smarter than us, the better.

A smarter-than-human AI system could potentially develop the tools necessary to exert control over humans. At the same time, highly capable AI systems may not possess a human sense of fairness, compassion, or conservatism. Consequently, the AI system’s single-minded pursuit of its programmed goals could cause it to deceive programmers, attempt to seize resources, or otherwise exhibit adversarial behaviors.

Fallenstein believes researchers must “ensure that AI would behave in ways that are reliably aligned with human interests.” However, even highly-reliable agent programming does not guarantee a positive impact; the effects of the system still depend upon whether it is pursuing human-approved goals. A superintelligent system may find clever, unintended ways to achieve the specific goals that it is given.

For example, imagine a super intelligent system designed to cure cancer “without doing anything bad.” This goal is rooted in cultural context and shared human knowledge. The AI may not completely understand what qualifies as “bad.” Therefore, it may try to cure cancer by stealing resources, proliferating robotic laboratories at the expense of the biosphere, kidnapping test subjects, or all of the above.

If a current AI system gets out of hand, researchers simply shut it down and modify its source code. However, modifying super-intelligent systems could prove to be more difficult, if not impossible. A system could acquire new hardware, alter its software, or take other actions that would leave the original programmers with only dubious control over the agent. And since most programmed goals are better achieved if the system stays operational and continues pursuing its goals than if it is deactivated or its goals are changed, systems will naturally tend to have an incentive to resist shutdown and to resist modifications to their goals.

Fallenstein explains that, in order to ensure that the development of super-intelligent AI has a positive impact on the world, “it must be constructed in such a way that it is amenable to correction, even if it has the ability to prevent or avoid correction.” The goal is not to design systems that fail in their attempts to deceive the programmers; the goal is to understand how highly intelligent and general-purpose reasoners with flawed goals can be built to have no incentives to deceive programmers in the first place. Instead, the intent is for the first highly capable systems to be “corrigible”—i.e., for them to recognize that their goals and other features are works in progress, and to work with programmers to identify and fix errors.

Little is known about the design or implementation details of such systems because everything, at this point, is hypothetical — no super-intelligent AI systems exist yet. As a consequence, the research described below focuses on formal agent foundations for AI alignment research — that is, on developing the basic conceptual tools and theories that are most likely to be useful for engineering robustly beneficial systems in the future.

Active research into this is focused on small “toy” problems and models of corrigible agents, in the hope that insight gained there could be applied to more realistic and complex versions of the problems. Fallenstein and her team sought to illuminate the key difficulties of AI using these models. One such toy problem is the “shutdown problem,” which involves designing a set of preferences that incentivize an agent to shut down upon the press of a button without also incentivizing the agent to either cause or prevent the pressing of that button. This would tell researchers whether a utility function could be specified such that agents using that function switch their preferences on demand, without having incentives to cause or prevent the switching.

Studying models in this formal logical setting has led to partial solutions, and further research that drives the development of methods for reasoning under logical uncertainty may continue.

The largest result thus far under this research program is “logical induction,” a line of research led by Scott Garrabrant. It functions as a new model of deductively-limited reasoning.

The kind of uncertainty we have about mathematical questions that are too difficult for us to settle one way or another right this moment is logical uncertainty. For example, a typical human mind can’t quickly answer the question:

What’s the 10100th digit of Pi?

Further, nobody has the computational resources to solve this in a reasonable amount of time. Despite this, mathematicians have lots of theories about how likely mathematical conjectures are to be true. As such, they must be implicitly using some sort of criterion that can be used to judge the probability that a mathematical statement is true or not. This type of “logical induction” proves that a computable logical inductor (an algorithm producing probability assignments that satisfy logical induction) exists.

The research team presented a computable algorithm that outpaces deduction, assigning high subjective probabilities to provable conjectures and low probabilities to disprovable conjectures long before the proofs can be produced. Among other accomplishments, the algorithm learns to reason competently about its own beliefs and trust its future beliefs while avoiding paradox. This gives some formal backing to the thought that real-world probabilistic agents can often be reasonably confident in their future reasoning in practice.

The team believes “there’s a good chance that this framework will open up new avenues of study in questions of metamathematics, decision theory, game theory, and computational reflection that have long seemed intractable.” They are also “cautiously optimistic” that they’ll improve our understanding of decision theory and counterfactual reasoning, and other problems related to AI value alignment.

At the same time, Fallenstein’s team doesn’t believe that all parts of the problem must be solved in advance. In fact, “the task of designing smarter, safer, more reliable systems could be delegated to early smarter-than-human systems.” This can only happen, though, as long as the research done by the AI can be trusted.

According to Fallenstein, this “call to arms” is vital, and “significant effort must be focused on the study of superintelligence alignment as soon as possible.” It is important to develop a formal understanding of AI alignment well in advance of making design decisions about smarter-than-human systems. By beginning the work early, humans inevitably face the risk that it may turn out to be irrelevant. However, failing to prepare could be even worse.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Using History to Chart the Future of AI: An Interview with Katja Grace

The million-dollar question in AI circles is: When? When will artificial intelligence become so smart and capable that it surpasses human beings at every task?

AI is already visible in the world through job automation, algorithmic financial trading, self-driving cars and household assistants like Alexa, but these developments are trivial compared to the idea of artificial general intelligence (AGI) – AIs that can perform a broad range of intellectual tasks just as humans can. Many computer scientists expect AGI at some point, but hardly anyone agrees on when it will be developed.

Given the unprecedented potential of AGI to create a positive or destructive future for society, many worry that humanity cannot afford to be surprised by its arrival. A surprise is not inevitable, however, and Katja Grace believes that if researchers can better understand the speed and consequences of advances in AI, society can prepare for a more beneficial outcome.

 

AI Impacts

Grace, a researcher for the Machine Intelligence Research Institute (MIRI), argues that, while we can’t chart the exact course of AI’s improvement, it is not completely unpredictable. Her project AI Impacts is dedicated to identifying and conducting cost-effective research projects that can shed light on when and how AI will impact society in the coming years. She aims to “help improve estimates of the social returns to AI investment, identify neglected research areas, improve policy, or productively channel public interest in AI.”

AI Impacts asks such questions as: How rapidly will AI develop? How much advanced notice should we expect to have of disruptive change? What are the likely economic impacts of human-level AI? Which paths to AI should be considered plausible or likely? Can we say anything meaningful about the impact of contemporary choices on long-term outcomes?

One way to get an idea of these timelines is to ask the experts. In AI Impacts’ 2015 survey of 352 AI researchers, these researchers predicted a 50 percent chance that AI will outcompete humans in almost everything by 2060. However the experts also answered a very similar question with a date seventy-five years later, and gave a huge range of answers individually, making it difficult to rule anything out. Grace hopes her research with AI Impacts will inform and improve these estimates.

 

Learning from History

Some thinkers believe that AI could progress rapidly, without much warning. This is based on the observation that algorithms don’t need factories, and so could in principle progress at the speed of a lucky train of thought.

However, Grace argues that while we have not developed human-level AI before, our vast experience developing other technologies can tell us a lot about what will happen with AI. Studying the timelines of other technologies can inform the AI timeline.

In one of her research projects, Grace studies jumps in technological progress throughout history, measuring these jumps in terms of how many years of progress happen in one ‘go’. “We’re interested in cases where more than a decade of progress happens in one go,” she explains. “The case of nuclear weapons is really the only case we could find that was substantially more than 100 years of progress in one go.”

For example, physicists began to consider nuclear energy in 1939, and by 1945 the US successfully tested a nuclear weapon. As Grace writes, “Relative effectiveness [of explosives] doubled less than twice in the 1100 years prior to nuclear weapons, then it doubled more than eleven times when the first nuclear weapons appeared. If we conservatively model previous progress as exponential, this is around 6000 years of progress in one step [compared to] previous rates.”

Grace also considered the history of high-temperature superconductors. Since the discovery of superconductors in 1911, peak temperatures for superconduction rose slowly, growing from 4K (Kelvin) initially to about 30K in the 1980s. Then in 1986, scientists discovered a new class of ceramics that increased the maximum temperature to 130K in just seven years. “That was close to 100 years of progress in one go,” she explains.

Nuclear weapons and superconductors are rare cases – most of the technologies that Grace has studied either don’t demonstrate discontinuity, or only show about 10-30 years of progress in one go. “The main implication of what we have done is that big jumps are fairly rare, so that should not be the default expectation,” Grace explains.

Furthermore, AI’s progress largely depends on how fast hardware and software improve, and those are processes we can observe now. For instance, if hardware progress starts to slow from its long run exponential progress, we should expect AI later.

Grace is currently investigating these unknowns about hardware. She wants to know “how fast the price of hardware is decreasing at the moment, how much hardware helps with AI progress relative to e.g. algorithmic improvements, and how custom hardware matters.”

 

Intelligence Explosion

AI researchers and developers must also be prepared for the possibility of an intelligence explosion – the idea that strong AI will improve its intelligence faster than humans could possibly understand or control.

Grace explains: “The thought is that once the AI becomes good enough, the AI will do its own AI research (instead of humans), and then we’ll have AI doing AI research where the AI research makes the AI smarter and then the AI can do even better AI research. So it will spin out of control.”

But she suggests that this feedback loop isn’t entirely unpredictable. “We already have intelligent [people] doing AI research that leads to better capabilities,” Grace explains. “We don’t have a perfect idea of what those things will be like when the AI is as intelligent as humans or as good at AI research, but we have some evidence about it from other places and we shouldn’t just be saying the spinning out of control could happen at any speed. We can get some clues about it now. We can say something about how many extra IQ points of AI you get for a year of research or effort, for example.”

AI Impacts is an ongoing project, and Grace hopes her research will find its way into conversations about intelligence explosions and other aspects of AI. With better-informed timeline estimates, perhaps policymakers and philanthropists can more effectively ensure that advanced AI doesn’t catch humanity by surprise.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the Future of Work: An Interview With Moshe Vardi

“The future of work is now,” says Moshe Vardi. “The impact of technology on labor has become clearer and clearer by the day.”

Machines have already automated millions of routine, working-class jobs in manufacturing. And now, AI is learning to automate non-routine jobs in transportation and logistics, legal writing, financial services, administrative support, and healthcare.

Vardi, a computer science professor at Rice University, recognizes this trend and argues that AI poses a unique threat to human labor.

 

Initiating a Policy Response

From the Luddite movement to the rise of the Internet, people have worried that advancing technology would destroy jobs. Yet despite painful adjustment periods during these changes, new jobs replaced old ones, and most workers found employment. But humans have never competed with machines that can outperform them in almost anything. AI threatens to do this, and many economists worry that society won’t be able to adapt.

“What people are now realizing is that this formula that technology destroys jobs and creates jobs, even if it’s basically true, it’s too simplistic,” Vardi explains.

The relationship between technology and labor is more complex: Will technology create enough jobs to replace those it destroys? Will it create them fast enough? And for workers whose skills are no longer needed – how will they keep up?

To address these questions and consider policy responses, Vardi will hold a summit in Washington, D.C. on December 12, 2017. The summit will address six current issues within technology and labor: education and training, community impact, job polarization, contingent labor, shared prosperity, and economic concentration.

Education and training

A 2013 computerization study found that 47% of American workers held jobs at high risk of automation in the next decade or two. If this happens, technology must create roughly 100 million jobs.

As the labor market changes, schools must teach students skills for future jobs, while at-risk workers need accessible training for new opportunities. Truck drivers won’t transition easily to website design and coding jobs without proper training, for example. Vardi expects that adapting to and training for new jobs will become more challenging as AI automates a greater variety of tasks. 

Community impact

Manufacturing jobs are concentrated in specific regions where employers keep local economies afloat. Over the last thirty years, the loss of 8 million manufacturing jobs has crippled Rust Belt regions in the U.S. – both economically and culturally.

Today, the fifteen million jobs that involve operating a vehicle are concentrated in certain regions as well. Drivers occupy up to 9% of jobs in the Bronx and Queens districts of New York City, up to 7% of jobs in select Southern California and Southern Texas districts, and over 4% in Wyoming and Idaho. Automation could quickly assume the majority of these jobs, devastating the communities that rely on them.

Job polarization

“One in five working class men between ages 25 to 54 without college education are not working,” Vardi explains. “Typically, when we see these numbers, we hear about some country in some horrible economic crisis like Greece. This is really what’s happening in working class America.”

Employment is currently growing in high-income cognitive jobs and low-income service jobs, such as elderly assistance and fast-food service, which computers cannot automate yet. But technology is hollowing out the economy by automating middle-skill, working-class jobs first.

Many manufacturing jobs pay $25 per hour with benefits, but these jobs aren’t easy to come by. Since 2000, when millions of these jobs disappeared, displaced workers have either left the labor force or accepted service jobs that often pay $12 per hour, without benefits.

Truck driving, the most common job in over half of US states, may see a similar fate.

Source: IPUMS-CPS/ University of Minnesota Credit: Quoctrung Bui/NPR

 

Contingent labor

Increasingly, communications technology allows firms to save money by hiring freelancers and independent contractors instead of permanent workers. This has created the Gig Economy – a labor market characterized by short-term contracts and flexible hours at the cost of unstable jobs with fewer benefits. By some estimates, in 2016, one in three workers were employed in the Gig Economy, but not all by choice. Policymakers must ensure that this new labor market supports its workers.

Shared prosperity

Automation has decoupled job creation from economic growth, allowing the economy to grow while employment and income shrink, thus increasing inequality. Vardi worries that AI will accelerate these trends. He argues that policies encouraging economic growth must also support economic mobility for the middle class.

Economic concentration

Technology creates a “winner-takes-all” environment, where second best can hardly survive. Bing search is quite similar to Google search, but Google is much more popular than Bing. And do Facebook or Amazon have any legitimate competitors?

Startups and smaller companies struggle to compete with these giants because of data. Having more users allows companies to collect more data, which machine-learning systems then analyze to help companies improve. Vardi thinks that this feedback loop will give big companies long-term market power.

Moreover, Vardi argues that these companies create relatively few jobs. In 1990, Detroit’s three largest companies were valued at $65 billion with 1.2 million workers. In 2016, Silicon Valley’s three largest companies were valued at $1.5 trillion but with only 190,000 workers.

 

Work and society

Vardi primarily studies current job automation, but he also worries that AI could eventually leave most humans unemployed. He explains, “The hope is that we’ll continue to create jobs for the vast majority of people. But if the situation arises that this is less and less the case, then we need to rethink: how do we make sure that everybody can make a living?”

Vardi also anticipates that high unemployment could lead to violence or even uprisings. He refers to Andrew McAfee’s closing statement at the 2017 Asilomar AI Conference, where McAfee said, “If the current trends continue, the people will rise up before the machines do.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Self-Driving Cars Use Probability

Even though human drivers don’t consciously think in terms of probabilities, we observe our environment and make decisions based on the likelihood of certain things happening. A driver doesn’t calculate the probability that the sports car behind her will pass her, but through observing the car’s behavior and considering similar situations in the past, she makes her best guess.

We trust probabilities because it is the only way to take action in the midst of uncertainty.

Autonomous systems such as self-driving cars will make similar decisions based on probabilities, but through a different process. Unlike a human who trusts intuition and experience, these autonomous cars calculate the probability of certain scenarios using data collectors and reasoning algorithms.

 

How to Determine Probability

Stefano Ermon, a computer scientist at Stanford University, wants to make self-driving cars and autonomous systems safer and more reliable by improving the way they reason probabilistically about their environment. He explains, “The challenge is that you have to take actions and you don’t know what will happen next. Probabilistic reasoning is just the idea of thinking about the world in terms of probabilities, assuming that there is uncertainty.”

There are two main components to achieve safety. First, the computer model must collect accurate data, and second, the reasoning system must be able to draw the right conclusions from the model’s data.

Ermon explains, “You need both: to build a reliable model you need a lot of data, and then you need to be able to draw the right conclusions based on the model, and that requires the artificial intelligence to think about these models accurately. Even if the model is right, but you don’t have a good way to reason about it, you can do catastrophic things.”

For example, in the context of autonomous vehicles, models use various sensors to observe the environment and collect data about countless variables, such as the behavior of the drivers around you, potholes and other obstacles in front of you, weather conditions—every possible data point.

A reasoning system then interprets this data. It uses the model’s information to decide whether the driver behind you is dangerously aggressive, if the pothole ahead will puncture your tire, if the rain is obstructing visibility, and the system continuously changes the car’s behavior to respond to these variables.

Consider the aggressive driver behind you. As Ermon explains, “Somehow you need to be able to reason about these models. You need to come up with a probability. You don’t know what the car’s going to do but you can estimate, and based on previous behavior you can say this car is likely to cut the line because it has been driving aggressively.”

 

Improving Probabilistic Reasoning

Ermon is creating strong algorithms that can synthesize all of the data that a model produces and make reliable decisions.

As models improve, they collect more information and capture more variables relevant to making these decisions. But as Ermon notes, “the more complicated the model is, the more variables you have, the more complicated it becomes to make the optimal decisions based on the model.”

Thus as the data collection expands, the analysis must also improve. The artificial intelligence in these cars must be able to reason with this increasingly complex data.

And this reasoning can easily go wrong. “You need to be very precise when computing these probabilities,” Ermon explains. “If the probability that a car cuts into your lane is 0.1, but you completely underestimate it and say it’s 0.01, you might end up making a fatal decision.”

To avoid fatal decisions, the artificial intelligence must be robust, but the data must also be complete. If the model collects incomplete data, “you have no guarantee that the number that you get when you run this algorithm has anything to do with the actual probability of that event,” Ermon explains.

The model and the algorithm entirely depend on each other to produce the optimal decision. If the model is incomplete and fails to capture the black ice in front of you, no reasoning system will be able to make a safe decision. And even if the model captures the black ice and every other possible variable, if the reasoning system cannot handle the complexity of this data, again the car will fail.

 

How Safe Will Autonomous Systems Be?

The technology in self-driving cars has made huge leaps lately, and Ermon is hopeful. “Eventually, as computers get better and algorithms get better and the models get better, hopefully we’ll be able to prevent all accidents,” he suggests.

However, there are still fundamental limitations on probabilistic reasoning. “Most computer scientists believe that it is impossible to come up with the silver bullet for this problem, an optimal algorithm that is so powerful that it can reason about all sorts of models that you can think about,” Ermon explains. “That’s the key barrier.”

But despite this barrier, self-driving cars will soon be available for consumers. Ford, for one, has promised to put its self-driving cars on the road by 2021. And while most computer scientists expect these cars to be far safer than human drivers, their success depends on their ability to reason probabilistically about their environment.

As Ermon explains, “You need to be able to estimate these kinds of probabilities because they are the building blocks that you need to make decisions.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Making Deep Learning More Robust

Imagine how much more efficient lawyers could be if they had the time to read every legal book ever written and review every case ever brought to court. Imagine doctors with the ability to study every advancement published across the world’s medical journals, or consult every medical case, ever. Unfortunately, the human brain cannot store that much information, and it would take decades to achieve these feats.

But a computer, one specifically designed to work like the human mind, could.

Deep learning neural networks are designed to mimic the human brain’s neural connections. They are capable of learning through continuous exposure to huge amounts of data. This allows them to recognize patterns, comprehend complex concepts, and translate high-level abstractions. These networks consist of many layers, each having a different set of weights. The deeper the network, the stronger it is.

Current applications for these networks include medical diagnosis, robotics and engineering, face recognition, and automotive navigation. However, deep learning is still in development – not surprisingly, it is a huge undertaking to get machines to think like humans. In fact, very little is understood about these networks, and months of manual tuning are often required for obtaining excellent performance.

Fuxin Li, assistant professor at the Oregon State University School of Electrical Engineering and Computer Science, and his team are taking on the accuracy of these neural networks under adversarial conditions. Their research focuses on the basic machine learning aspects of deep learning, and how to make general deep learning more robust.

To try to better understand when a deep convolutional neural network (CNN) is going to be right or wrong, Li’s team had to establish an estimate of confidence in the predictions of the deep learning architecture. Those estimates can be used as safeguards when utilizing the networks in real life.

“Basically,” explains Li, “trying to make deep learning increasingly self-aware – to be aware of what type of data it has seen, and what type of data it could work on.”

The team looked at recent advances in deep learning, which have greatly improved the capability to recognize images automatically. Those networks, albeit very resistant to overfitting, were discovered to completely fail if some of the pixels in such images were perturbed via an adversarial optimization algorithm.

To a human observer, the image in question may look fine, but the deep network sees otherwise. According to the researchers, those adversarial examples are dangerous if a deep network is utilized into any crucial real application, such as autonomous driving. If the result of the network can be hacked, wrong authentications and other devastating effects would be unavoidable.

In a departure from previous perspectives that focused on improving the classifiers to correctly organize the adversarial examples, the team focused on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. The accuracy for detecting adversarial examples exceeded 96%. Notably, 90% of the adversarials can be detected with a false positive rate of less than 10%.

The benefits of this research are numerous. It is vital for a neural network to be able to identify whether an example comes from a normal or an adversarial distribution. Such knowledge, if available, will help significantly to control behaviors of robots employing deep learning. A reliable procedure can prevent robots from behaving in an undesirable manner because of the false perceptions it made about the environment.

Li gives one example: “In robotics there’s this big issue about robots not doing something based on erroneous perception. It’s important for a robot to know that it’s not making a confident perception. For example, if [the robot] is saying there’s an object over there, but it’s actually a wall, he’ll go to fetch that object, and then he hits a wall.”

Hopefully, Li says, that won’t happen. However, current software and machine learning have been mostly based solely on prediction confidence within the original machine learning framework. Basically, the testing and training data are assumed to be pulled from the same distribution independently, and that can lead to incorrect assumptions.

Better confidence estimates could potentially help avoid incidents such as the Tesla crash scenario from May 2016, where an adversarial example (truck with too much light) was in the middle of the highway that cheated the system. A confidence estimate could potentially solve that issue. But first, the computer must be smarter. The computer has to learn to detect objects and differentiate, say, a tree from another vehicle.

“To make it really robust, you need to account for unknown objects. Something weird may hit you. A deer may jump out.” The network can’t be taught every unexpected situation, says Li, “so you need it to discover them without knowledge of what they are. That’s something that we do. We try to bridge the gap.”

Training procedures will make deep learning more automatic and lead to fewer failures, as well as confidence estimates when the deep network is utilized to predict new data. Most of this training, explains Li, comes from photo distribution using stock images. However, these are flat images much different than what a robot would normally see in day-to-day life. It’s difficult to get a 360-degree view just by looking at photos.

“There will be a big difference between the thing [the robot] trains on and the thing it really sees. So then, it is important for the robot to understand that it can predict some things confidently, and others it cannot,” says Li. “[The robot] needs to understand that it probably predicted wrong, so as not to act too aggressively toward its prediction.” This can only be achieved with a more self-aware framework, which is what Li is trying to develop with this grant.

Further, these estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Soon, Li and his team will start generalizing the approach to other domains, such as temporal models (RNNs, LSTMs) and deep reinforcement learning. In reinforcement learning, the confidence estimates could play an important role in many decision-making paradigms.

Li’s most recent update on this work can be found here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Financial World of AI

Automated algorithms currently manage over half of trading volume in US equities, and as AI improves, it will continue to assume control over important financial decisions. But these systems aren’t foolproof. A small glitch could send shares plunging, potentially costing investors billions of dollars.

For firms, the decision to accept this risk is simple. The algorithms in automated systems are faster and more accurate than any human, and deploying the most advanced AI technology can keep firms in business.

But for the rest of society, the consequences aren’t clear. Artificial intelligence gives firms a competitive edge, but will these rapidly advancing systems remain safe and robust? What happens when they make mistakes?

 

Automated Errors

Michael Wellman, a professor of computer science at the University of Michigan, studies AI’s threats to the financial system. He explains, “The financial system is one of the leading edges of where AI is automating things, and it’s also an especially vulnerable sector. It can be easily disrupted, and bad things can happen.”

Consider the story of Knight Capital. On August 1, 2012, Knight decided to try out new software to stay competitive in a new trading pool. The software passed its safety tests, but when Knight deployed it, the algorithm activated its testing software instead of the live trading program. The testing software sent millions of bad orders in the following minutes as Knight frantically tried to stop it. But the damage was done.

In just 45 minutes, Knight Capital lost $440 million – nearly four times their profit in 2011 – all because of one line of code.

In this case, the damage was constrained to Knight, but what happens when one line of code can impact the entire financial system?

 

Understanding Autonomous Trading Agents

Wellman argues that autonomous trading agents are difficult to control because they process and respond to information at unprecedented speeds, they can be easily replicated on a large scale, they act independently, and they adapt to their environment.

With increasingly general capabilities, systems may learn to make money in dangerous ways that their programmers never intended. As Lawrence Pingree, an analyst at Gartner, said after the Knight meltdown, “Computers do what they’re told. If they’re told to do the wrong thing, they’re going to do it and they’re going to do it really, really well.”

In order to prevent AI systems from undermining market transparency and stability, government agencies and academics must learn how these agents work.

 

Market Manipulation

Even benign uses of AI can hinder market transparency, but Wellman worries that AI systems will learn to manipulate markets.

Autonomous trading agents are especially effective at exploiting arbitrage opportunities – where they simultaneously purchase and sell an asset to profit from pricing differences. If, for example, a stock trades at $30 in one market and $32 in a second market, an agent can buy the $30 stock and immediately sell it for $32 in the second market, making a $2 profit.

Market inefficiency naturally creates arbitrage opportunities. However, an AI may learn – on its own – to create pricing discrepancies by taking misleading actions that move the market to generate profit.

One manipulative technique is ‘spoofing’ – the act of bidding for a stock item with the intent to cancel the bid before execution. This moves the market in a certain direction, and the spoofer profits from the false signal.

Wellman and his team recently reproduced spoofing in their laboratory models, as part of an effort to understand the situations where spoofing can be effective. He explains, “We’re doing this in the laboratory to see if we can characterize the signature of AIs doing this, so that we reliably detect it and design markets to reduce vulnerability.”

As agents improve, they may learn to exploit arbitrage more maliciously by creating artificial items on the market to mislead traders, or by hacking accounts to report false events that move markets. Wellman’s work aims to produce methods to help control such manipulative behavior.

 

Secrecy in the Financial World

But the secretive nature of finance prevents academics from fully understanding the role of AI.

Wellman explains, “We know they use AI and machine learning to a significant extent, and they are constantly trying to improve their algorithms. We don’t know to what extent things like market manipulation and spoofing are automated right now, but we know that they could be automated and that could lead to something of an arms race between market manipulators and the systems trying to detect and run surveillance for market bad behavior.”

Government agencies – such as the Securities and Exchange Commission – watch financial markets, but “they’re really outgunned as far as the technology goes,” Wellman notes. “They don’t have the expertise or the infrastructure to keep up with how fast things are changing in the industry.”

But academics can help. According to Wellman, “even without doing the trading for money ourselves, we can reverse engineer what must be going on in the financial world and figure out what can happen.”

 

Preparing for Advanced AI

Although Wellman studies current and near-term AI, he’s concerned about the threat of advanced, general AI.

“One thing we can do to try to understand the far-out AI is to get experience with dealing with the near-term AI,” he explains. “That’s why we want to look at regulation of autonomous agents that are very near on the horizon or current. The hope is that we’ll learn some lessons that we can then later apply when the superintelligence comes along.”

AI systems are improving rapidly, and there is intense competition between financial firms to use them. Understanding and tracking AI’s role in finance will help financial markets remain stable and transparent.

“We may not be able to manage this threat with 100% reliability,” Wellman admits, “but I’m hopeful that we can redesign markets to make them safer for the AIs and eliminate some forms of the arms race, and that we’ll be able to get a good handle on preventing some of the most egregious behaviors.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Silo Busting in AI Research

Artificial intelligence may seem like a computer science project, but if it’s going to successfully integrate with society, then social scientists must be more involved.

Developing an intelligent machine is not merely a problem of modifying algorithms in a lab. These machines must be aligned with human values, and this requires a deep understanding of ethics and the social consequences of deploying intelligent machines.

Getting people with a variety of backgrounds together seems logical enough in theory, but in practice, what happens when computer scientists, AI developers, economists, philosophers, and psychologists try to discuss AI issues? Do any of them even speak the same language?

Social scientists and computer scientists will come at AI problems from very different directions. And if they collaborate, everybody wins. Social scientists can learn about the complex tools and algorithms used in computer science labs, and computer scientists can become more attuned to the social and ethical implications of advanced AI.

Through transdisciplinary learning, both fields will be better equipped to handle the challenges of developing AI, and society as a whole will be safer.

 

Silo Busting

Too often, researchers focus on their narrow area of expertise, rarely reaching out to experts in other fields to solve common problems. AI is no different, with thick walls – sometimes literally – separating the social sciences from the computer sciences. This process of breaking down walls between research fields is often called silo-busting.

If AI researchers largely operate in silos, they may lose opportunities to learn from other perspectives and collaborate with potential colleagues. Scientists might miss gaps in their research or reproduce work already completed by others, because they were secluded away in their silo. This can significantly hamper the development of value-aligned AI.

To bust these silos, Wendell Wallach organized workshops to facilitate knowledge-sharing among leading computer and social scientists. Wallach, a consultant, ethicist, and scholar at Yale University’s Interdisciplinary Center for Bioethics, holds these workshops at The Hastings Center, where he is a senior advisor.

With co-chairs Gary Marchant, Stuart Russell, and Bart Selman, Wallach held the first workshop in April 2016. “The first workshop was very much about exposing people to what experts in all of these different fields were thinking about,” Wallach explains. “My intention was just to put all of these people in a room and hopefully they’d see that they weren’t all reinventing the wheel, and recognize that there were other people who were engaged in similar projects.”

The workshop intentionally brought together experts from a variety of viewpoints, including engineering ethics, philosophy, and resilience engineering, as well as participants from the Institute of Electrical and Electronics Engineers (IEEE), the Office of Naval Research, and the World Economic Forum (WEF). Wallach recounts, “some were very interested in how you implement sensitivity to moral considerations in AI computationally, and others were more interested in how AI changes the societal context.”

Other participants studied how the engineers of these systems may be susceptible to harmful cognitive biases and conflicts of interest, while still others focused on governance issues surrounding AI. Each of these viewpoints is necessary for developing beneficial AI, and The Hastings Center’s workshop gave participants the opportunity to learn from and teach each other.

But silo-busting is not easy. Wallach explains, “everybody has their own goals, their own projects, their own intentions, and it’s hard to hear someone say, ‘maybe you’re being a little naïve about this.’” When researchers operate exclusively in silos, “it’s almost impossible to understand how people outside of those silos did what they did,” he adds.

The intention of the first workshop was not to develop concrete strategies or proposals, but rather to open researchers’ minds to the broad challenges of developing AI with human values. “My suspicion is, the most valuable things that came out of this workshop would be hard to quantify,” Wallach clarifies. “It’s more like people’s minds were being stretched and opened. That was, for me, what this was primarily about.”

The workshop did yield some tangible results. For example, Marchant and Wallach introduced a pilot project for the international governance of AI, and nearly everyone at the workshop agreed to work on it. Since then, the IEEE, the International Committee of the Red Cross, the UN, the World Economic Forum, and other institutions have agreed to become active partners with The Hastings Center in building global infrastructure to ensure that AI and Robotics are beneficial.

This transdisciplinary cooperation is a promising sign that Wallach’s efforts are succeeding in strengthening the global response to AI challenges.

 

Value Alignment

Wallach and his co-chairs held a second workshop at the end of October. The participants were mostly scientists, but also included social theorists, a legal scholar, philosophers, and ethicists. The overall goal remained – to bust AI silos and facilitate transdisciplinary cooperation – but this workshop had a narrower focus.

“We made it more about value alignment and machine ethics,” he explains. “The tension in the room was between those who thought the problem [of value alignment] was imminently solvable and those who were deeply skeptical about solving the problem at all.”

In general, Wallach observed that “the social scientists and philosophers tend to overplay the difficulties [of creating AI with full value alignment] and computer scientists tend to underplay the difficulties.”

Wallach believes that while computer scientists will build the algorithms and utility functions for AI, they will need input from social scientists to ensure value alignment. “If a utility function represents 100,000 inputs, social theorists will help the AI researchers understand what those 100,000 inputs are,” he explains. “The AI researchers might be able to come up with 50,000-60,000 on their own, but they’re suddenly going to realize that people who have thought much more deeply about applied ethics are perhaps sensitive to things that they never considered.”

“I’m hoping that enough of [these researchers] learn each other’s language and how to communicate with each other, that they’ll recognize the value they can get from collaborating together,” he says. “I think I see evidence of that beginning to take place.”

 

Moving Forward

Developing value-aligned AI is a monumental task with existential risks. Experts from various perspectives must be willing to learn from each other and adapt their understanding of the issue.

In this spirit, The Hastings Center is leading the charge to bring the various AI silos together. After two successful events that resulted in promising partnerships, Wallach and his co-chairs will hold their third workshop in Spring 2018. And while these workshops are a small effort to facilitate transdisciplinary cooperation on AI, Wallach is hopeful.

“It’s a small group,” he admits, “but it’s people who are leaders in these various fields, so hopefully that permeates through the whole field, on both sides.”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Artificial Intelligence and the King Midas Problem

Value alignment. It’s a phrase that often pops up in discussions about the safety and ethics of artificial intelligence. How can scientists create AI with goals and values that align with those of the people it interacts with?

Very simple robots with very constrained tasks do not need goals or values at all. Although the Roomba’s designers know you want a clean floor, Roomba doesn’t: it simply executes a procedure that the Roomba’s designers predict will work—most of the time. If your kitten leaves a messy pile on the carpet, Roomba will dutifully smear it all over the living room. If we keep programming smarter and smarter robots, then by the late 2020s, you may be able to ask your wonderful domestic robot to cook a tasty, high-protein dinner. But if you forgot to buy any meat, you may come home to a hot meal but find the aforementioned cat has mysteriously vanished. The robot, designed for chores, doesn’t understand that the sentimental value of the cat exceeds its nutritional value.

AI and King Midas

Stuart Russell, a renowned AI researcher, compares the challenge of defining a robot’s objective to the King Midas myth. “The robot,” says Russell, “has some objective and pursues it brilliantly to the destruction of mankind. And it’s because it’s the wrong objective. It’s the old King Midas problem.”

This is one of the big problems in AI safety that Russell is trying to solve. “We’ve got to get the right objective,” he explains, “and since we don’t seem to know how to program it, the right answer seems to be that the robot should learn – from interacting with and watching humans – what it is humans care about.”

Russell works from the assumption that the robot will solve whatever formal problem we define. Rather than assuming that the robot should optimize a given objective, Russell defines the problem as a two-player game (“game” as used by economists, meaning a decision problem with multiple agents) called cooperative inverse reinforcement learning (CIRL).

A CIRL game includes a person and a robot: the robot’s only purpose is to make the person happy, but it doesn’t know what the person wants. Fortunately, it can learn more about what the person wants by observing her behavior. For example, if a robot observed the human’s morning routine, it should discover how important coffee is—not to itself, of course (we don’t want robots drinking coffee), but to the human. Then, it will make coffee for the person without being asked.

What about the example of King Midas, who thought he wanted as much gold as possible? How does the approach deal with someone stating their wishes incorrectly?

“We don’t have to know what we want. That’s the whole point. It’s hard to know what we want, and what we ask for is evidence for what we really want, but it’s not ground truth,” says Russell.

The robot should learn what people want by watching their actions. Then, if someone asks for something that’s technically incorrect, the robot will understand their underlying wish. Russell adds, “The robot—if it had been there—would have told Midas that he didn’t really want everything turned to gold, maybe just a few choice objects that he might point at from time to time and say the magic word.”

AI Off-Switch

Russell and his Berkeley colleagues also recently announced further progress toward ensuring safe AI, with a paper on ensuring an AI’s off-switch is always accessible. An off-switch is often suggested to guarantee a robot doesn’t harm humans – if the robot goes bad, just turn it off! But what if the AI anticipates that and disables its off switch?

Russell and his team propose solving this by providing a level of uncertainty in the robot’s objective. They created another CIRL game in which the robot chooses to announce its intentions, which gives the human a chance to switch it off, or to plunge ahead with its plan, bypassing human oversight.

They showed that a robot with a fixed objective will take the second choice and disable its off-switch. But if there is uncertainty about the objective, the robot will reason that a human decision to switch the robot off would imply the announced intention is undesirable. Because its goal is to make the human happy, the robot accepts being switched off. The robot has a positive incentive not to disable the off-switch, and that incentive is directly tied to the robot’s uncertainty about the human’s true objective.

Ensuring AI Safety

In addition to his research, Russell is also one of the most vocal and active AI safety researchers concerned with ensuring a stronger public understanding of the potential issues surrounding AI development.

He recently co-authored a rebuttal to an article in the MIT Technology Review, which claimed that real AI scientists weren’t worried about the existential threat of AI. Russell and his co-author summed up why it’s better to be cautious and careful than just assume all will turn out for the best:

“Our experience with Chernobyl suggests it may be unwise to claim that a powerful technology entails no risks. It may also be unwise to claim that a powerful technology will never come to fruition. On September 11, 1933, Lord Rutherford, perhaps the world’s most eminent nuclear physicist, described the prospect of extracting energy from atoms as nothing but “moonshine.” Less than 24 hours later, Leo Szilard invented the neutron-induced nuclear chain reaction; detailed designs for nuclear reactors and nuclear weapons followed a few years later. Surely it is better to anticipate human ingenuity than to underestimate it, better to acknowledge the risks than to deny them. … [T]he risk [of AI] arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives.”

This summer, Russell received a grant of over $5.5 million from the Open Philanthropy Project for a new research center, the Center for Human-Compatible Artificial Intelligence, in Berkeley. Among the primary objectives of the Center will be to study this problem of value alignment, to continue his efforts toward provably beneficial AI, and to ensure we don’t make the same mistakes as King Midas.

“Look,” he says, “if you were King Midas, would you want your robot to say, ‘Everything turns to gold? OK, boss, you got it.’ No! You’d want it to say, ‘Are you sure? Including your food, drink, and relatives? I’m pretty sure you wouldn’t like that. How about this: you point to something and say ‘Abracadabra Aurificio’ or something, and then I’ll turn it to gold, OK?’”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Problem of Defining Autonomous Weapons

What, exactly, is an autonomous weapon? For the general public, the phrase is often used synonymously with killer robots and triggers images of the Terminator. But for the military, the definition of an autonomous weapons system, or AWS, is deceivingly simple.

The United States Department of Defense defines an AWS as “a weapon system that, once activated, can select and engage targets without further intervention by a human operator.  This includes human-supervised autonomous weapon systems that are designed to allow human operators to override operation of the weapon system, but can select and engage targets without further human input after activation.”

Basically, it is a weapon that can be used in any domain — land, air, sea, space, cyber, or any combination thereof — and encompasses significantly more than just the platform that fires the munition. This means that there are various capabilities the system possesses, such as identifying targets, tracking, and firing, all of which may have varying levels of human interaction and input.

Heather Roff, a research scientist at The Global Security Initiative at Arizona State University and a senior research fellow at the University of Oxford, suggests that even the basic terminology of the DoD’s definition is unclear.

“This definition is problematic because we don’t really know what ‘select’ means here.  Is it ‘detect’ or ‘select’?” she asks. Roff also notes another definitional problem arises because, in many instances, the difference between an autonomous weapon (acting independently) and an automated weapon (pre-programmed to act automatically) is not clear.

 

A Database of Weapons Systems

State parties to the UN’s Convention on Conventional Weapons (CCW) also grapple with what constitutes an autonomous — and not a current automated — weapon. During the last three years of discussion at Informal Meetings of Experts at the CCW, participants typically only referred to two or three presently deployed weapons systems that appear to be AWS, such as the Israeli Harpy or the United States’ Counter Rocket and Mortar system.

To address this, the International Committee of the Red Cross requested more data on presently deployed systems. It wanted to know what the weapons systems are that states currently use and what projects are under development. Roff took up the call to action. She poured over publicly available data from a variety of sources and compiled a database of 284 weapons systems. She wanted to know what capacities already existed on presently deployed systems and whether these were or were not “autonomous.”

“The dataset looks at the top five weapons exporting countries, so that’s Russia, China, the United States, France and Germany,” says Roff. “I’m looking at major sales and major defense industry manufacturers from each country. And then I look at all the systems that are presently deployed by those countries that are manufactured by those top manufacturers, and I code them along a series of about 20 different variables.”

These variables include capabilities like navigation, homing, target identification, firing, etc., and for each variable, Roff coded a weapon as either having the capacity or not. Roff then created a series of three indices to bundle the various capabilities: self-mobility, self-direction, and self-determination. Self-mobility capabilities allow a system to move by itself, self-direction relates to target identification, and self-determination indexes the abilities that a system may possess in relation to goal setting, planning, and communication. Most “smart” weapons have high self-direction and self-mobility, but few, if any, have self-determination capabilities.

As Roff explains in a recent Foreign Policy post, the data shows that “the emerging trend in autonomy has less to do with the hardware and more on the areas of communications and target identification. What we see is a push for better target identification capabilities, identification friend or foe (IFF), as well as learning.  Systems need to be able to adapt, to learn, and to change or update plans while deployed. In short, the systems need to be tasked with more things and vaguer tasks.” Thus newer systems will need greater self-determination capabilities.

 

The Human in the Loop

But understanding what the weapons systems can do is only one part of the equation. In most systems, humans still maintain varying levels of control, and the military often claims that a human will always be “in the loop.” That is, a human will always have some element of meaningful control over the system. But this leads to another definitional problem: just what is meaningful human control?

Roff argues that this idea of keeping a human “in the loop” isn’t just “unhelpful,” but that it may be “hindering our ability to think about what’s wrong with autonomous systems.” She references what the UK Ministry of Defense calls, the Empty Hangar Problem: no one expects to walk into a military airplane hangar and discover that the autonomous plane spontaneously decided, on its own, to go to war.

“That’s just not going to happen,” Roff says, “These systems are always going to be used by humans, and humans are going to decide to use them.” But thinking about humans in some loop, she contends, means that any difficulties with autonomy get pushed aside.

Earlier this year, Roff worked with Article 36, which coined the phrase “meaningful human control,” to establish more a more clear-cut definition of the term. They published a concept paper, Meaningful Human Control, Artificial Intelligence and Autonomous Weapons, which offered guidelines for delegates at the 2016 CCW Meeting of Experts on Lethal Autonomous Weapons Systems.

In the paper, Roff and Richard Moyes outlined key elements – such as predictable, reliable and transparent technology, accurate user information, a capacity for timely human action and intervention, human control during attacks, etc. – for determining whether an AWS allows for meaningful human control.

“You can’t offload your moral obligation to a non-moral agent,” says Roff. “So that’s where I think our work on meaningful human control is: a human commander has a moral obligation to undertake precaution and proportionality in each attack.” The weapon system cannot do it for the human.

Researchers and the international community are only beginning to tackle the ethical issues that arise from AWSs. Clearly defining the weapons systems and the role humans will continue to play is one small part of a very big problem. Roff will continue to work with the international community to establish more well defined goals and guidelines.

“I’m hoping that the doctrine and the discussions that are developing internationally and through like-minded states will actually guide normative generation of how to use or not use such systems,” she says.

Heather Roff also spoke about this work on an FLI podcast.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Complex AI Systems Explain Their Actions

cobots_mauela_veloso

In the future, service robots equipped with artificial intelligence (AI) are bound to be a common sight. These bots will help people navigate crowded airports, serve meals, or even schedule meetings.

As these AI systems become more integrated into daily life, it is vital to find an efficient way to communicate with them. It is obviously more natural for a human to speak in plain language rather than a string of code. Further, as the relationship between humans and robots grows, it will be necessary to engage in conversations, rather than just give orders.

This human-robot interaction is what Manuela M. Veloso’s research is all about. Veloso, a professor at Carnegie Mellon University, has focused her research on CoBots, autonomous indoor mobile service robots which transport items, guide visitors to building locations, and traverse the halls and elevators. The CoBot robots have been successfully autonomously navigating for several years now, and have traveled more than 1,000km. These accomplishments have enabled the research team to pursue a new direction, focusing now on novel human-robot interaction.

“If you really want these autonomous robots to be in the presence of humans and interacting with humans, and being capable of benefiting humans, they need to be able to talk with humans” Veloso says.

 

Communicating With CoBots

Veloso’s CoBots are capable of autonomous localization and navigation in the Gates-Hillman Center using WiFi, LIDAR, and/or a Kinect sensor (yes, the same type used for video games).

The robots navigate by detecting walls as planes, which they match to the known maps of the building. Other objects, including people, are detected as obstacles, so navigation is safe and robust. Overall, the CoBots are good navigators and are quite consistent in their motion. In fact, the team noticed the robots could wear down the carpet as they traveled the same path numerous times.

Because the robots are autonomous, and therefore capable of making their own decisions, they are out of sight for large amounts of time while they navigate the multi-floor buildings.

The research team began to wonder about this unaccounted time. How were the robots perceiving the environment and reaching their goals? How was the trip? What did they plan to do next?

“In the future, I think that incrementally we may want to query these systems on why they made some choices or why they are making some recommendations,” explains Veloso.

The research team is currently working on the question of why the CoBots took the route they did while autonomous. The team wanted to give the robots the ability to record their experiences and then transform the data about their routes into natural language. In this way, the bots could communicate with humans and reveal their choices and hopefully the rationale behind their decisions.

 

Levels of Explanation

The “internals” underlying the functions of any autonomous robots are completely based on numerical computations, and not natural language. For example, the CoBot robots in particular compute the distance to walls, assigning velocities to their motors to enable the motion to specific map coordinates.

Asking an autonomous robot for a non-numerical explanation is complex, says Veloso. Furthermore, the answer can be provided in many potential levels of detail.

“We define what we call the ‘verbalization space’ in which this translation into language can happen with different levels of detail, with different levels of locality, with different levels of specificity.”

For example, if a developer is asking a robot to detail their journey, they might expect a lengthy retelling, with details that include battery levels. But a random visitor might just want to know how long it takes to get from one office to another.

Therefore, the research is not just about the translation from data to language, but also the acknowledgment that the robots need to explain things with more or less detail. If a human were to ask for more detail, the request triggers CoBot “to move” into a more detailed point in the verbalization space.

“We are trying to understand how to empower the robots to be more trustable through these explanations, as they attend to what the humans want to know,” says Veloso. The ability to generate explanations, in particular at multiple levels of detail, will be especially important in the future, as the AI systems will work with more complex decisions. Humans could have a more difficult time inferring the AI’s reasoning. Therefore, the bot will need to be more transparent.

For example, if you go to a doctor’s office and the AI there makes a recommendation about your health, you may want to know why it came to this decision, or why it recommended one medication over another.

Currently, Veloso’s research focuses on getting the robots to generate these explanations in plain language. The next step will be to have the robots incorporate natural language when humans provide them with feedback. “[The CoBot] could say, ‘I came from that way,’ and you could say, ‘well next time, please come through the other way,’” explains Veloso.

These sorts of corrections could be programmed into the code, but Veloso believes that “trustability” in AI systems will benefit from our ability to dialogue, query, and correct their autonomy. She and her team aim at contributing to a multi-robot, multi-human symbiotic relationship, in which robots and humans coordinate and cooperate as a function of their limitations and strengths.

“What we’re working on is to really empower people – a random person who meets a robot – to still be able to ask things about the robot in natural language,” she says.

In the future, when we will have more and more AI systems that are able to perceive the world, make decisions, and support human decision-making, the ability to engage in these types of conversations will be essential­­.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Who is Responsible for Autonomous Weapons?

Consider the following wartime scenario: Hoping to spare the lives of soldiers, a country deploys an autonomous weapon to wipe out an enemy force. This robot has demonstrated military capabilities that far exceed even the best soldiers, but when it hits the ground, it gets confused. It can’t distinguish the civilians from the enemy soldiers and begins taking innocent lives. The military generals desperately try to stop the robot, but by the time they succeed it has already killed dozens.

Who is responsible for this atrocity? Is it the commanders who deployed the robot, the designers and manufacturers of the robot, or the robot itself?

 

Liability: Autonomous Systems

As artificial intelligence improves, governments may turn to autonomous weapons — like military robots — in order to gain the upper hand in armed conflict. These weapons can navigate environments on their own and make their own decisions about who to kill and who to spare. While the example above may never occur, unintended harm is inevitable. Considering these scenarios helps formulate important questions that governments and researchers must jointly consider, namely:

How do we hold human beings accountable for the actions of autonomous systems? And how is justice served when the killer is essentially a computer?

As it turns out, there is no straightforward answer to this dilemma. When a human soldier commits an atrocity and kills innocent civilians, that soldier is held accountable. But when autonomous weapons do the killing, it’s difficult to blame them for their mistakes.

An autonomous weapon’s “decision” to murder innocent civilians is like a computer’s “decision” to freeze the screen and delete your unsaved project. Frustrating as a frozen computer may be, people rarely think the computer intended to complicate their lives.

Intention must be demonstrated to prosecute someone for a war crime, and while autonomous weapons may demonstrate outward signs of decision-making and intention, they still run on a code that’s just as impersonal as the code that glitches and freezes a computer screen. Like computers, these systems are not legal or moral agents, and it’s not clear how to hold them accountable — or if they can be held accountable — for their mistakes.

So who assumes the blame when autonomous weapons take innocent lives? Should they even be allowed to kill at all?

 

Liability: from Self-Driving Cars to Autonomous Weapons

Peter Asaro, a philosopher of science, technology, and media at The New School in New York City, has been working on addressing these fundamental questions of responsibility and liability with all autonomous systems, not just weapons. By exploring fundamental concepts of autonomy, agency, and liability, he intends to develop legal approaches for regulating the use of autonomous systems and the harm they cause.

At a recent conference on the Ethics of Artificial Intelligence, Asaro discussed the liability issues surrounding the application of AI to weapons systems. He explained, “AI poses threats to international law itself — to the norms and standards that we rely on to hold people accountable for [decisions, and to] hold states accountable for military interventions — as [people are] able to blame systems for malfunctioning instead of taking responsibility for their decisions.”

The legal system will need to reconsider who is held liable to ensure that justice is served when an accident happens. Asaro argues that the moral and legal issues surrounding autonomous weapons are much different than the issues surrounding other autonomous machines, such as self-driving cars.

Though researchers still expect the occasional fatal accident to occur with self-driving cars, these autonomous vehicles are designed with safety in mind. One of the goals of self-driving cars is to save lives. “The fundamental difference is that with any kind of weapon, you’re intending to do harm, so that carries a special legal and moral burden,” Asaro explains. “There is a moral responsibility to ensure that [the weapon is] only used in legitimate and appropriate circumstances.”

Furthermore, liability with autonomous weapons is much more ambiguous than it is with self-driving cars and other domestic robots.

With self-driving cars, for example, bigger companies like Volvo intend to embrace strict liability – where the manufacturers assume full responsibility for accidental harm. Although it is not clear how all manufacturers will be held accountable for autonomous systems, strict liability and threats of class-action lawsuits incentivize manufacturers to make their product as safe as possible.

Warfare, on the other hand, is a much messier situation.

“You don’t really have liability in war,” says Asaro. “The US military could sue a supplier for a bad product, but as a victim who was wrongly targeted by a system, you have no real legal recourse.”

Autonomous weapons only complicate this. “These systems become more unpredictable as they become more sophisticated, so psychologically commanders feel less responsible for what those systems do. They don’t internalize responsibility in the same way,” Asaro explained at the Ethics of AI conference.

To ensure that commanders internalize responsibility, Asaro suggests that “the system has to allow humans to actually exercise their moral agency.”

That is, commanders must demonstrate that they can fully control the system before they use it in warfare. Once they demonstrate control, it can become clearer who can be held accountable for the system’s actions.

 

Preparing for the Unknown

Behind these concerns about liability, lies the overarching concern that autonomous machines might act in ways that humans never intended. Asaro asks: “When these systems become more autonomous, can the owners really know what they’re going to do?”

Even the programmers and manufacturers may not know what their machines will do. The purpose of developing autonomous machines is so they can make decisions themselves – without human input. And as the programming inside an autonomous system becomes more complex, people will increasingly struggle to predict the machine’s action.

Companies and governments must be prepared to handle the legal complexities of a domestic or military robot or system causing unintended harm. Ensuring justice for those who are harmed may not be possible without a clear framework for liability.

Asaro explains, “We need to develop policies to ensure that useful technologies continue to be developed, while ensuring that we manage the harms in a just way. A good start would be to prohibit automating decisions over the use of violent and lethal force, and to focus on managing the safety risks in beneficial autonomous systems.”

Peter Asaro also spoke about this work on an FLI podcast. You can learn more about his work at http://www.peterasaro.org.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Cybersecurity and Machine Learning

When it comes to cybersecurity, no nation can afford to slack off. If a nation’s defense systems cannot anticipate how an attacker will try to fool them, then an especially clever attack could expose military secrets or use disguised malware to cause major networks to crash.

A nation’s defense systems must keep up with the constant threat of attack, but this is a difficult and never-ending process. It seems that the defense is always playing catch-up.

Ben Rubinstein, a professor at the University of Melbourne in Australia, asks: “Wouldn’t it be good if we knew what the malware writers are going to do next, and to know what type of malware is likely to get through the filters?”

In other words, what if defense systems could learn to anticipate how attackers will try to fool them?

 

Adversarial Machine Learning

In order to address this question, Rubinstein studies how to prepare machine-learning systems to catch adversarial attacks. In the game of national cybersecurity, these adversaries are often individual hackers or governments who want to trick machine-learning systems for profit or political gain.

Nations have become increasingly dependent on machine-learning systems to protect against such adversaries. Unaided by humans, machine-learning systems in anti-malware and facial recognition software have the ability to learn and improve their function as they encounter new data. As they learn, they become better at catching adversarial attacks.

Machine-learning systems are generally good at catching adversaries, but they are not completely immune to threats, and adversaries are constantly looking for new ways to fool them. Rubinstein says, “Machine learning works well if you give it data like it’s seen before, but if you give it data that it’s never seen before, there’s no guarantee that it’s going to work.”

With adversarial machine learning, security agencies address this weakness by presenting the system with different types of malicious data to test the system’s filters. The system then digests this new information and learns how to identify and capture malware from clever attackers.

 

Security Evaluation of Machine-Learning Systems

Rubinstein’s project is called “Security Evaluation of Machine-Learning Systems”, and his ultimate goal is to develop a software tool that companies and government agencies can use to test their defenses. Any company or agency that uses machine-learning systems could run his software against their system. Rubinstein’s tool would attack and try to fool the system in order to expose the system’s vulnerabilities. In doing so, his tool anticipates how an attacker could slip by the system’s defenses.

The software would evaluate existing machine-learning systems and find weak spots that adversaries might try to exploit – similar to how one might defend a castle.

“We’re not giving you a new castle,” Rubinstein says, “we’re just going to walk around the perimeter and look for holes in the walls and weak parts of the castle, or see where the moat is too shallow.”

By analyzing many different machine-learning systems, his software program will pick up on trends and be able to advise security agencies to either use a different system or bolster the security of their existing system. In this sense, his program acts as a consultant for every machine-learning system.

Consider a program that does facial recognition. This program would use machine learning to identify faces and catch adversaries that pretend to look like someone else.

Rubinstein explains: “Our software aims to automate this security evaluation so that it takes an image of a person and a program that does facial recognition, and it will tell you how to change its appearance so that it will evade detection or change the outcome of machine learning in some way.”

This is called a mimicry attack – when an adversary makes one instance (one face) look like another, and thereby fools a system.

To make this example easier to visualize, Rubinstein’s group built a program that demonstrates how to change a face’s appearance to fool a machine-learning system into thinking that it is another face.

In the image below, the two faces don’t look alike, but the left image has been modified so that the machine-learning system thinks it is the same as the image on the right. This example provides insight into how adversaries can fool machine-learning systems by exploiting quirks.

ben-rubinstein-facial-recognition

When Rubinstein’s software fools a system with a mimicry attack, security personnel can then take that information and retrain their program to establish more effective security when the stakes are higher.

 

Minimizing the Attacker’s Advantage

While Rubinstein’s software will help to secure machine-learning systems against adversarial attacks, he has no illusions about the natural advantages that attackers enjoy. It will always be easier to attack a castle than to defend it, and the same holds true for a machine-learning system. This is called the ‘asymmetry of cyberwarfare.’

“The attacker can come in from any angle. It only needs to succeed at one point, but the defender needs to succeed at all points,” says Rubinstein.

In general, Rubinstein worries that the tools available to test machine-learning systems are theoretical in nature, and put too much responsibility on the security personnel to understand the complex math involved. A researcher might redo the mathematical analysis for every new learning system, but security personnel are unlikely to have the time or resources to keep up.

Rubinstein aims to “bring what’s out there in theory and make it more applied and more practical and easy for anyone who’s using machine learning in a system to evaluate the security of their system.”

With his software, Rubinstein intends to help level the playing field between attackers and defenders. By giving security agencies better tools to test and adapt their machine-learning systems, he hopes to improve the ability of security personnel to anticipate and guard against cyberattacks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Supervising AI Growth

When Apple released its software application, Siri, in 2011, iPhone users had high expectations for their intelligent personal assistants. Yet despite its impressive and growing capabilities, Siri often makes mistakes. The software’s imperfections highlight the clear limitations of current AI: today’s machine intelligence can’t understand the varied and changing needs and preferences of human life.

However, as artificial intelligence advances, experts believe that intelligent machines will eventually – and probably soon – understand the world better than humans. While it might be easy to understand how or why Siri makes a mistake, figuring out why a superintelligent AI made the decision it did will be much more challenging.

If humans cannot understand and evaluate these machines, how will they control them?

Paul Christiano, a Ph.D. student in computer science at UC Berkeley, has been working on addressing this problem. He believes that to ensure safe and beneficial AI, researchers and operators must learn to measure how well intelligent machines do what humans want, even as these machines surpass human intelligence.

 

Semi-supervised Learning

The most obvious way to supervise the development of an AI system also happens to be the hard way. As Christiano explains: “One way humans can communicate what they want, is by spending a lot of time digging down on some small decision that was made [by an AI], and try to evaluate how good that decision was.”

But while this is theoretically possible, the human researchers would never have the time or resources to evaluate every decision the AI made. “If you want to make a good evaluation, you could spend several hours analyzing a decision that the machine made in one second,” says Christiano.

For example, suppose an amateur chess player wants to understand a better chess player’s previous move. Merely spending a few minutes evaluating this move won’t be enough, but if she spends a few hours she could consider every alternative and develop a meaningful understanding of the better player’s moves.

Fortunately for researchers, they don’t need to evaluate every decision an AI makes in order to be confident in its behavior. Instead, researchers can choose “the machine’s most interesting and informative decisions, where getting feedback would most reduce our uncertainty,“ Christiano explains.

“Say your phone pinged you about a calendar event while you were on a phone call,” he elaborates, “That event is not analogous to anything else it has done before, so it’s not sure whether it is good or bad.” Due to this uncertainty, the phone would send the transcript of its decisions to an evaluator at Google, for example. The evaluator would study the transcript, ask the phone owner how he felt about the ping, and determine whether pinging users during phone calls is a desirable or undesirable action. By providing this feedback, Google teaches the phone when it should interrupt users in the future.

This active learning process is an efficient method for humans to train AIs, but what happens when humans need to evaluate AIs that exceed human intelligence?

Consider a computer that is mastering chess. How could a human give appropriate feedback to the computer if the human has not mastered chess? The human might criticize a move that the computer makes, only to realize later that the machine was correct.

With increasingly intelligent phones and computers, a similar problem is bound to occur. Eventually, Christiano explains, “we need to handle the case where AI systems surpass human performance at basically everything.”

If a phone knows much more about the world than its human evaluators, then the evaluators cannot trust their human judgment. They will need to “enlist the help of more AI systems,” Christiano explains.

 

Using AIs to Evaluate Smarter AIs

When a phone pings a user while he is on a call, the user’s reaction to this decision is crucial in determining whether the phone will interrupt users during future phone calls. But, as Christiano argues, “if a more advanced machine is much better than human users at understanding the consequences of interruptions, then it might be a bad idea to just ask the human ‘should the phone have interrupted you right then?’” The human might express annoyance at the interruption, but the machine might know better and understand that this annoyance was necessary to keep the user’s life running smoothly.

In these situations, Christiano proposes that human evaluators use other intelligent machines to do the grunt work of evaluating an AI’s decisions. In practice, a less capable System 1 would be in charge of evaluating the more capable System 2. Even though System 2 is smarter, System 1 can process a large amount of information quickly, and can understand how System 2 should revise its behavior. The human trainers would still provide input and oversee the process, but their role would be limited.

This training process would help Google understand how to create a safer and more intelligent AI – System 3 – which the human researchers could then train using System 2.

Christiano explains that these intelligent machines would be like little agents that carry out tasks for humans. Siri already has this limited ability to take human input and figure out what the human wants, but as AI technology advances, machines will learn to carry out complex tasks that humans cannot fully understand.

 

Can We Ensure that an AI Holds Human Values?

As Google and other tech companies continue to improve their intelligent machines with each evaluation, the human trainers will fulfill a smaller role. Eventually, Christiano explains, “it’s effectively just one machine evaluating another machine’s behavior.”

Ideally, “each time you build a more powerful machine, it effectively models human values and does what humans would like,” says Christiano. But he worries that these machines may stray from human values as they surpass human intelligence. To put this in human terms: a complex intelligent machine would resemble a large organization of humans. If the organization does tasks that are too complex for any individual human to understand, it may pursue goals that humans wouldn’t like.

In order to address these control issues, Christiano is working on an “end-to-end description of this machine learning process, fleshing out key technical problems that seem most relevant.” His research will help bolster the understanding of how humans can use AI systems to evaluate the behavior of more advanced AI systems. If his work succeeds, it will be a significant step in building trustworthy artificial intelligence.

You can learn more about Paul Christiano’s work here.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

How Can AI Learn to Be Safe?

As artificial intelligence improves, machines will soon be equipped with intellectual and practical capabilities that surpass the smartest humans. But not only will machines be more capable than people, they will also be able to make themselves better. That is, these machines will understand their own design and how to improve it – or they could create entirely new machines that are even more capable.

The human creators of AIs must be able to trust these machines to remain safe and beneficial even as they self-improve and adapt to the real world.

Recursive Self-Improvement

This idea of an autonomous agent making increasingly better modifications to its own code is called recursive self-improvement. Through recursive self-improvement, a machine can adapt to new circumstances and learn how to deal with new situations.

To a certain extent, the human brain does this as well. As a person develops and repeats new habits, connections in their brains can change. The connections grow stronger and more effective over time, making the new, desired action easier to perform (e.g. changing one’s diet or learning a new language). In machines though, this ability to self-improve is much more drastic.

An AI agent can process information much faster than a human, and if it does not properly understand how its actions impact people, then its self-modifications could quickly fall out of line with human values.

For Bas Steunebrink, a researcher at the Swiss AI lab IDSIA, solving this problem is a crucial step toward achieving safe and beneficial AI.

Building AI in a Complex World

Because the world is so complex, many researchers begin AI projects by developing AI in carefully controlled environments. Then they create mathematical proofs that can assure them that the AI will achieve success in this specified space.

But Steunebrink worries that this approach puts too much responsibility on the designers and too much faith in the proof, especially when dealing with machines that can learn through recursive self-improvement. He explains, “We cannot accurately describe the environment in all its complexity; we cannot foresee what environments the agent will find itself in in the future; and an agent will not have enough resources (energy, time, inputs) to do the optimal thing.”

If the machine encounters an unforeseen circumstance, then that proof the designer relied on in the controlled environment may not apply. Says Steunebrink, “We have no assurance about the safe behavior of the [AI].”

Experience-based Artificial Intelligence

Instead, Steunebrink uses an approach called EXPAI (experience-based artificial intelligence). EXPAI are “self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated, or dismissed when falsified.”

Instead of trusting only a mathematical proof, researchers can ensure that the AI develops safe and benevolent behaviors by teaching and testing the machine in complex, unforeseen environments that challenge its function and goals.

With EXPAI, AI machines will learn from interactive experience, and therefore monitoring their growth period is crucial. As Steunebrink posits, the focus shifts from asking, “What is the behavior of an agent that is very intelligent and capable of self-modification, and how do we control it?” to asking, “How do we grow an agent from baby beginnings such that it gains both robust understanding and proper values?”

Consider how children grow and learn to navigate the world independently. If provided with a stable and healthy childhood, children learn to adopt values and understand their relation to the external world through trial and error, and by examples. Childhood is a time of growth and learning, of making mistakes, of building on success – all to help prepare the child to grow into a competent adult who can navigate unforeseen circumstances.

Steunebrink believes that researchers can ensure safe AI through a similar, gradual process of experience-based learning. In an architectural blueprint developed by Steunebrink and his colleagues, the AI is constructed “starting from only a small amount of designer-specific code – a seed.” Like a child, the beginnings of the machine will be less competent and less intelligent, but it will self-improve over time, as it learns from teachers and real-world experience.

As Steunebrink’s approach focuses on the growth period of an autonomous agent, the teachers, not the programmers, are most responsible for creating a robust and benevolent AI. Meanwhile, the developmental stage gives researchers time to observe and correct an AI’s behavior in a controlled setting where the stakes are still low.

The Future of EXPAI

Steunebrink and his colleagues are currently creating what he describes as a “pedagogy to determine what kind of things to teach to agents and in what order, how to test what the agents understand from being taught, and, depending on the results of such tests, decide whether we can proceed to the next steps of teaching or whether we should reteach the agent or go back to the drawing board.”

A major issue Steunebrink faces is that his method of experience-based learning diverges from the most popular methods for improving AI. Instead of doing the intellectual work of crafting a proof-backed optimal learning algorithm on a computer, EXPAI requires extensive in-person work with the machine to teach it like a child.

Creating safe artificial intelligence might prove to be more a process of teaching and growth rather than a function of creating the perfect mathematical proof. While such a shift in responsibility may be more time-consuming, it could also help establish a far more comprehensive understanding of an AI before it is released into the real world.

Steunebrink explains, “A lot of work remains to move beyond the agent implementation level, towards developing the teaching and testing methodologies that enable us to grow an agent’s understanding of ethical values, and to ensure that the agent is compelled to protect and adhere to them.”

The process is daunting, he admits, “but it is not as daunting as the consequences of getting AI safety wrong.”

If you would like to learn more about Bas Steunebrink’s research, you can read about his project here, or visit http://people.idsia.ch/~steunebrink/. He is also the co-founder of NNAISENSE, which you can learn about at https://nnaisense.com/.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Training Artificial Intelligence to Compromise

Imagine you’re sitting in a self-driving car that’s about to make a left turn into on-coming traffic. One small system in the car will be responsible for making the vehicle turn, one system might speed it up or hit the brakes, other systems will have sensors that detect obstacles, and yet another system may be in communication with other vehicles on the road. Each system has its own goals — starting or stopping, turning or traveling straight, recognizing potential problems, etc. — but they also have to all work together toward one common goal: turning into traffic without causing an accident.

Harvard professor and FLI researcher, David Parkes, is trying to solve just this type of problem. Parkes told FLI, “The particular question I’m asking is: If we have a system of AIs, how can we construct rewards for individual AIs, such that the combined system is well behaved?”

Essentially, an AI within a system of AIs — like that in the car example above — needs to learn how to meet its own objective, as well as how to compromise so that it’s actions will help satisfy the group objective. On top of that, the system of AIs needs to consider the preferences of society. The safety of the passenger in the car or a pedestrian in the crosswalk is a higher priority than turning left.

Training a well-behaved AI

Because environments like a busy street are so complicated, an engineer can’t just program an AI to act in some way to always achieve its objectives. AIs need to learn proper behavior based on a rewards system. “Each AI has a reward for its action and the action of the other AI,” Parkes explained. With the world constantly changing, the rewards have to evolve, and the AIs need to keep up not only with how their own goals change, but also with the evolving objectives of the system as a whole.

The idea of a rewards-based learning system is something most people can likely relate to. Who doesn’t remember the excitement of a gold star or a smiley face on a test? And any dog owner has experienced how much more likely their pet is to perform a trick when it realizes it will get a treat. A reward for an AI is similar.

A technique often used in designing artificial intelligence is reinforcement learning. With reinforcement learning, when the AI takes some action, it receives either positive or negative feedback. And it then tries to optimize its actions to receive more positive rewards. However, the reward can’t just be programmed into the AI. The AI has to interact with its environment to learn which actions will be considered good, bad or neutral. Again, the idea is similar to a dog learning that tricks can earn it treats or praise, but misbehaving could result in punishment.

More than this, Parkes wants to understand how to distribute rewards to subcomponents – the individual AIs – in order to achieve good system-wide behavior. How often should there be positive (or negative) reinforcement, and in reaction to which types of actions?

For example, if you were to play a video game without any points or lives or levels or other indicators of success or failure, you might run around the world killing or fighting aliens and monsters, and you might eventually beat the game, but you wouldn’t know which specific actions led you to win. Instead, games are designed to provide regular feedback and reinforcement so that you know when you make progress and what steps you need to take next. To train an AI, Parkes has to determine which smaller actions will merit feedback so that the AI can move toward a larger, overarching goal.

Rather than programming a reward specifically into the AI, Parkes shapes the way rewards flow from the environment to the AI in order to promote desirable behaviors as the AI interacts with the world around it.

But this is all for just one AI. How do these techniques apply to two or more AIs?

Training a system of AIs

Much of Parkes’ work involves game theory. Game theory helps researchers understand what types of rewards will elicit collaboration among otherwise self-interested players, or in this case, rational AIs. Once an AI figures out how to maximize its own reward, what will entice it to act in accordance with another AI? To answer this question, Parkes turns to an economic theory called mechanism design.

Mechanism design theory is a Nobel-prize winning theory that allows researchers to determine how a system with multiple parts can achieve an overarching goal. It is a kind of “inverse game theory.” How can rules of interaction – ways to distribute rewards, for instance – be designed so individual AIs will act in favor of system-wide and societal preferences? Among other things, mechanism design theory has been applied to problems in auctions, e-commerce, regulations, environmental policy, and now, artificial intelligence.

The difference between Parkes’ work with AIs and mechanism design theory is that the latter requires some sort of mechanism or manager overseeing the entire system. In the case of an automated car or a drone, the AIs within have to work together to achieve group goals, without a mechanism making final decisions. As the environment changes, the external rewards will change. And as the AIs within the system realize they want to make some sort of change to maximize their rewards, they’ll have to communicate with each other, shifting the goals for the entire autonomous system.

Parkes summarized his work for FLI, saying, “The work that I’m doing as part of the FLI grant program is all about aligning incentives so that when autonomous AIs decide how to act, they act in a way that’s not only good for the AI system, but also good for society more broadly.”

Parkes is also involved with the One Hundred Year Study on Artificial Intelligence, and he explained his “research with FLI has informed a broader perspective on thinking about the role that AI can play in an urban context in the near future.” As he considers the future, he asks, “What can we see, for example, from the early trajectory of research and development on autonomous vehicles and robots in the home, about where the hard problems will be in regard to the engineering of value-aligned systems?”

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

The Evolution of AI: Can Morality be Programmed?

The following article was originally posted on Futurism.com.

Recent advances in artificial intelligence have made it clear that our computers need to have a moral code. Disagree? Consider this: A car is driving down the road when a child on a bicycle suddenly swerves in front of it. Does the car swerve into an oncoming lane, hitting another car that is already there? Does the car swerve off the road and hit a tree? Does it continue forward and hit the child?

Each solution comes with a problem: It could result in death.

It’s an unfortunate scenario, but humans face such scenarios every day, and if an autonomous car is the one in control, it needs to be able to make this choice. And that means that we need to figure out how to program morality into our computers.

Vincent Conitzer, a Professor of Computer Science at Duke University, recently received a grant from the Future of Life Institute in order to try and figure out just how we can make an advanced AI that is able to make moral judgments…and act on them.

MAKING MORALITY

At first glance, the goal seems simple enough—make an AI that behaves in a way that is ethically responsible; however, it’s far more complicated than it initially seems, as there are an amazing amount of factors that come into play. As Conitzer’s project outlines, “moral judgments are affected by rights (such as privacy), roles (such as in families), past actions (such as promises), motives and intentions, and other morally relevant features. These diverse factors have not yet been built into AI systems.”

That’s what we’re trying to do now.

In a recent interview with Futurism, Conitzer clarified that, while the public may be concerned about ensuring that rogue AI don’t decide to wipe-out humanity, such a thing really isn’t a viable threat at the present time (and it won’t be for a long, long time). As a result, his team isn’t concerned with preventing a global-robotic-apocalypse by making selfless AI that adore humanity. Rather, on a much more basic level, they are focused on ensuring that our artificial intelligence systems are able to make the hard, moral choices that humans make on a daily basis.

So, how do you make an AI that is able to make a difficult moral decision?

Conitzer explains that, to reach their goal, the team is following a two path process: Having people make ethical choices in order to find patterns and then figuring out how that can be translated into an artificial intelligence. He clarifies, “what we’re working on right now is actually having people make ethical decisions, or state what decision they would make in a given situation, and then we use machine learning to try to identify what the general pattern is and determine the extent that we could reproduce those kind of decisions.”

In short, the team is trying to find the patterns in our moral choices and translate this pattern into AI systems. Conitzer notes that, on a basic level, it’s all about making predictions regarding what a human would do in a given situation, “if we can become very good at predicting what kind of decisions people make in these kind of ethical circumstances, well then, we could make those decisions ourselves in the form of the computer program.”

However, one major problem with this is, of course, that morality is not objective — it’s neither timeless nor universal.

Conitzer articulates the problem by looking to previous decades, “if we did the same ethical tests a hundred years ago, the decisions that we would get from people would be much more racist, sexist, and all kinds of other things that we wouldn’t see as ‘good’ now. Similarly, right now, maybe our moral development hasn’t come to its apex, and a hundred years from now people might feel that some of the things we do right now, like how we treat animals, is completely immoral. So there’s kind of a risk of bias and with getting stuck at whatever our current level of moral development is.”

And of course, there is the aforementioned problem regarding how complex morality is. “Pure altruism, that’s very easy to address in game theory, but maybe you feel like you owe me something based on previous actions. That’s missing from the game theory literature, and so that’s something that we’re also thinking about a lot—how can you make this, what game theory calls ‘Solutions Concept’—sensible? How can you compute these things?”

To solve these problems, and to help figure out exactly how morality functions and can (hopefully) be programmed into an AI, the team is combining the methods from computer science, philosophy, and psychology “That’s, in a nutshell, what our project is about,” Conitzer asserts.

But what about those sentient AI? When will we need to start worrying about them and discussing how they should be regulated?

THE HUMAN-LIKE AI

According to Conitzer, human-like artificial intelligence won’t be around for some time yet (so yay! No Terminator-styled apocalypse…at least for the next few years).

“Recently, there have been a number of steps towards such a system, and I think there have been a lot of surprising advances….but I think having something like a ‘true AI,’ one that’s really as flexible, able to abstract, and do all these things that humans do so easily, I think we’re still quite far away from that,” Conitzer asserts.

True, we can program systems to do a lot of things that humans do well, but there are some things that are exceedingly complex and hard to translate into a pattern that computers can recognize and learn from (which is ultimately the basis of all AI).

“What came out of early AI research, the first couple decades of AI research, was the fact that certain things that we had thought of as being real benchmarks for intelligence, like being able to play chess well, were actually quite accessible to computers. It was not easy to write and create a chess-playing program, but it was doable.”

Indeed, today, we have computers that are able to beat the best players in the world in a host of games—Chess and Alpha Go, for example.

But Conitzer clarifies that, as it turns out, playing games isn’t exactly a good measure of human-like intelligence. Or at least, there is a lot more to the human mind. “Meanwhile, we learned that other problems that were very simple for people were actually quite hard for computers, or to program computers to do. For example, recognizing your grandmother in a crowd. You could do that quite easily, but it’s actually very difficult to program a computer to recognize things that well.”

Since the early days of AI research, we have made computers that are able to recognize and identify specific images. However, to sum the main point, it is remarkably difficult to program a system that is able to do all of the things that humans can do, which is why it will be some time before we have a ‘true AI.’

Yet, Conitzer asserts that now is the time to start considering what the rules we will use to govern such intelligences. “It may be quite a bit further out, but to computer scientists, that means maybe just on the order of decades, and it definitely makes sense to try to think about these things a little bit ahead.” And he notes that, even though we don’t have any human-like robots just yet, our intelligence systems are already making moral choices and could, potentially, save or end lives.

“Very often, many of these decisions that they make do impact people and we may need to make decisions that we will typically be considered to be a morally loaded decision. And a standard example is a self-driving car that has to decide to either go straight and crash into the car ahead of it or veer off and maybe hurt some pedestrian. How do you make those trade-offs? And that I think is something we can really make some progress on. This doesn’t require superintelligent AI, this can just be programs that make these kind of trade-offs in various ways.”

But of course, knowing what decision to make will first require knowing exactly how our morality operates (or at least having a fairly good idea). From there, we can begin to program it, and that’s what Conitzer and his team are hoping to do.

So welcome to the dawn of moral robots.

This interview has been edited for brevity and clarity. 

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.

Grants Timeline