Transparent and Interpretable AI: an interview with Percy Liang

At the end of 2017, the United States House of Representatives passed a bill called the SELF DRIVE Act, laying out an initial federal framework for autonomous vehicle regulation. Autonomous cars have been undergoing testing on public roads for almost two decades. With the passing of this bill, along with the increasing safety benefits of autonomous vehicles, it is likely that they will become even more prevalent in our daily lives. This is true for numerous autonomous technologies including those in the medical, legal, and safety fields – just to name a few.

To that end, researchers, developers, and users alike must be able to have confidence in these types of technologies that rely heavily on artificial intelligence (AI). This extends beyond autonomous vehicles, applying to everything from security devices in your smart home to the personal assistant in your phone.

 

Predictability in Machine Learning

Percy Liang, Assistant Professor of Computer Science at Stanford University, explains that humans rely on some degree of predictability in their day-to-day interactions — both with other humans and automated systems (including, but not limited to, their cars). One way to create this predictability is by taking advantage of machine learning.

Machine learning deals with algorithms that allow an AI to “learn” based on data gathered from previous experiences. Developers do not need to write code that dictates each and every action or intention for the AI. Instead, the system recognizes patterns from its experiences and assumes the appropriate action based on that data. It is akin to the process of trial and error.

A key question often asked of machine learning systems in the research and testing environment is, “Why did the system make this prediction?” About this search for intention, Liang explains:

“If you’re crossing the road and a car comes toward you, you have a model of what the other human driver is going to do. But if the car is controlled by an AI, how should humans know how to behave?”

It is important to see that a system is performing well, but perhaps even more important is its ability to explain in easily understandable terms why it acted the way it did. Even if the system is not accurate, it must be explainable and predictable. For AI to be safely deployed, systems must rely on well-understood, realistic, and testable assumptions.

Current theories that explore the idea of reliable AI focus on fitting the observable outputs in the training data. However, as Liang explains, this could lead “to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs.”

Running multiple tests is important, of course. These types of simulations, explains Liang, “are good for debugging techniques — they allow us to more easily perform controlled experiments, and they allow for faster iteration.”

However, to really know whether a technique is effective, “there is no substitute for applying it to real life,” says Liang, “ this goes for language, vision, and robotics.” An autonomous vehicle may perform well in all testing conditions, but there is no way to accurately predict how it could perform in an unpredictable natural disaster.

 

Interpretable ML Systems

The best-performing models in many domains — e.g., deep neural networks for image and speech recognition — are obviously quite complex. These are considered “blackbox models,” and their predictions can be difficult, if not impossible, for them to explain.

Liang and his team are working to interpret these models by researching how a particular training situation leads to a prediction. As Liang explains, “Machine learning algorithms take training data and produce a model, which is used to predict on new inputs.”

This type of observation becomes increasingly important as AIs take on more complex tasks – think life or death situations, such as interpreting medical diagnoses. “If the training data has outliers or adversarially generated data,” says Liang, “this will affect (corrupt) the model, which will in turn cause predictions on new inputs to be possibly wrong.  Influence functions allow you to track precisely the way that a single training point would affect the prediction on a particular new input.”

Essentially, by understanding why a model makes the decisions it makes, Liang’s team hopes to improve how models function, discover new science, and provide end users with explanations of actions that impact them.

Another aspect of Liang’s research is ensuring that an AI understands, and is able to communicate, its limits to humans. The conventional metric for success, he explains, is average accuracy, “which is not a good interface for AI safety.” He posits, “what is one to do with an 80 percent reliable system?”

Liang is not looking for the system to have an accurate answer 100 percent of the time. Instead, he wants the system to be able to admit when it does not know an answer. If a user asks a system “How many painkillers should I take?” it is better for the system to say, “I don’t know” rather than making a costly or dangerous incorrect prediction.

Liang’s team is working on this challenge by tracking a model’s predictions through its learning algorithm — all the way back to the training data where the model parameters originated.

Liang’s team hopes that this approach — of looking at the model through the lens of the training data — will become a standard part of the toolkit of developing, understanding, and diagnosing machine learning. He explains that researchers could relate this to many applications: medical, computer, natural language understanding systems, and various business analytics applications.

“I think,” Liang concludes, “there is some confusion about the role of simulations some eschew it entirely and some are happy doing everything in simulation. Perhaps we need to change culturally to have a place for both.

In this way, Liang and his team plan to lay a framework for a new generation of machine learning algorithms that work reliably, fail gracefully, and reduce risks.

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project. If you’re interested in applying for our 2018 grants competition, please see this link.

As CO2 Levels Rise, Scientists Question Best- and Worst-Case Scenarios of Climate Change

Scientists know that the planet is warming, that humans are causing it, and that we’re running out of time to avoid catastrophic climate change. But at the same time, their estimates for future global warming can seem frustratingly vague — best-case scenarios allow decades to solve the energy crisis, while worst-case scenarios seem utterly hopeless, predicting an uninhabitable planet no matter what we do.

At the University of Exeter, some researchers disagree with these vague boundaries. Professors Peter Cox, Chris Huntingford, and Mark Williamson co-authored a recent report in Nature that argues for a more constrained understanding of the climate’s sensitivity to carbon dioxide. In general, they found that both the worst-case and best-case scenarios for global warming are far more unlikely than previously thought.

Their research focuses on a measure known as equilibrium climate sensitivity (ECS) — defined as “the global mean warming that would occur if the atmospheric carbon dioxide (CO2) concentration were instantly doubled and the climate were then brought to equilibrium with that new level of CO2.”

This concept simplifies Earth’s actual climate — CO2 won’t double instantly and it often takes decades or centuries for the climate to return to equilibrium — but ECS is critical for gauging the planet’s response to fossil fuel emissions. It can help predict how much warming will come from increases in atmospheric CO2, even before the climate settles into equilibrium.

 

How hot will it get if atmospheric CO2 doubles?

In other words, what is Earth’s ECS? The Intergovernmental Panel on Climate Change (IPCC) predicts that ECS is between 1.5-4.5 °C, with a 25% chance that it exceeds 4 °C and a 16% chance that it’s lower than 1.5 °C.

Cox and his colleagues argue that this range is too generous. Using tighter constraints based on historical observations of warming, they conclude that doubling atmospheric CO2 would push temperatures between 2.2–3.4 °C higher, with a 2% chance that ECS exceeds 4 °C and a 3% chance that ECS is lower than 1.5 °C. The extremes (both good and bad) of global warming thus appear less likely.

Although some scientists applauded these findings, others are more skeptical. Kevin Trenberth, a Senior Scientist in the Climate Analysis Section at the National Center for Atmospheric Research (NCAR), says the study’s climate models don’t adequately account for natural variability, making it difficult to give the findings much weight.

“I do think some previous estimates are overblown and they do not adequately use the observations we have as constraints,” he explains. “This study picks up on that a bit, and in that sense the new results seem reasonable and could be important for ruling out really major extreme changes. But it is much more important to improve the models and make better projections into the future.”

 

But When Will Atmospheric CO2 Double?

CO2 levels may not have doubled from pre-industrial levels yet, but they’re increasing at an alarming rate.

In 1958, NOAA’s Mauna Loa observatory opened in Hawaii to monitor atmospheric change. Its first reading of atmospheric CO2 levels clocked in at 280 parts per million (ppm). In 2013, CO2 levels surpassed 400 ppm for the first time, and just four years later, the Mauna Loa Observatory recorded its first-ever carbon dioxide reading above 410 ppm.

The last time CO2 levels were this high, global surface temperatures were 6 °C higher, oceans were 100 feet higher, and modern humans didn’t exist. Unless the international community makes massive strides towards the Paris Agreement goals, atmospheric CO2 could rise to 560 ppm by 2050 — double the concentration in 1958, and a sign of much more global warming to come.

Annual CO2 Emissions from Fossil Fuels by Country, 1959-2017 / Source: Carbon Brief

 

 

 

 

 

 

 

 

 

 

 

 

 

Avoiding the worst, while ensuring the bad

On the one hand, Cox’s findings come as a sigh of relief, as they reduce uncertainty about ECS and renew hope of avoiding catastrophic global warming.

But these results also imply that there’s very little hope of achieving the best-case scenarios predicted by the Paris Agreement, which seeks to keep temperatures at or below a 1.5 °C increase. Since atmospheric CO2 levels could plausibly double by midcentury, Cox’s results indicate that not only will temperatures soar past 1.5 °C, but that they’ll quickly rise higher than Paris’ upper limit of 2 degrees.

Even 2 °C of warming would be devastating for the planet, leading to an ice-free Arctic and over a meter of sea level rise — enough to submerge the Marshall Islands — while leaving tropical regions deathly hot for outdoor workers and metropolises Karachi and Kolkata nearly uninhabitable. Deadly heat waves would plague North Africa, Central America, Southeast Asia, and the Southeast US, while decreasing the yields of wheat, rice and corn by over 20%. Food shortages and extreme weather could trigger the migration of tens of millions of people and leave regions of the world ungovernable.

This two-degree world might not be far off. Global temperatures have already risen 0.8 degrees celsius since pre-industrial levels, and the past few years have provided grave indications that things are heating up.

In January, NASA announced that 2017 was the second-hottest year on record (behind 2016 and ahead of 2015) while NOAA recorded it as their third-hottest year on record. Despite this minor discrepancy, both agencies agree that the 2017 data make the past four years the hottest period in their 138-year archives.

Global warming continues, and since the climate responds to rising CO2 levels on a delay of decades, there is more warming “in the pipeline,” no matter how quickly we cut fossil fuel emissions. But understanding ECS and continuing to improve climate models, as Dr. Trenberth suggests, can provide a clearer picture of what’s ahead and give us a better idea of the actions we need to take.

Is There a Trade-off Between Immediate and Longer-term AI Safety Efforts?

Something I often hear in the machine learning community and media articles is “Worries about superintelligence are a distraction from the *real* problem X that we are facing today with AI” (where X = algorithmic bias, technological unemployment, interpretability, data privacy, etc). This competitive attitude gives the impression that immediate and longer-term safety concerns are in conflict. But is there actually a tradeoff between them?

tradeoff

We can make this question more specific: what resources might these two types of efforts be competing for?

Media attention. Given the abundance of media interest in AI, there have been a lot of articles about all these issues. Articles about advanced AI safety have mostly been alarmist Terminator-ridden pieces that ignore the complexities of the problem. This has understandably annoyed many AI researchers, and led some of them to dismiss these risks based on the caricature presented in the media instead of the real arguments. The overall effect of media attention towards advanced AI risk has been highly negative. I would be very happy if the media stopped writing about superintelligence altogether and focused on safety and ethics questions about today’s AI systems.

Funding. Much of the funding for advanced AI safety work currently comes from donors and organizations who are particularly interested in these problems, such as the Open Philanthropy Project and Elon Musk. They would be unlikely to fund safety work that doesn’t generalize to advanced AI systems, so their donations to advanced AI safety research are not taking funding away from immediate problems. On the contrary, FLI’s first grant program awarded some funding towards current issues with AI (such as economic and legal impacts). There isn’t a fixed pie of funding that immediate and longer-term safety are competing for – it’s more like two growing pies that don’t overlap very much. There has been an increasing amount of funding going into both fields, and hopefully this trend will continue.

Talent. The field of advanced AI safety has grown in recent years but is still very small, and the “brain drain” resulting from researchers going to work on it has so far been negligible. The motivations for working on current and longer-term problems tend to be different as well, and these problems often attract different kinds of people. For example, someone who primarily cares about social justice is more likely to work on algorithmic bias, while someone who primarily cares about the long-term future is more likely to work on superintelligence risks.

Overall, there does not seem to be much tradeoff in terms of funding or talent, and the media attention tradeoff could (in theory) be resolved by devoting essentially all the airtime to current concerns. Not only are these issues not in conflict – there are synergies between addressing them. Both benefit from fostering a culture in the AI research community of caring about social impact and being proactive about risks. Some safety problems are highly relevant both in the immediate and longer term, such as interpretability and adversarial examples. I think we need more people working on these problems for current systems while keeping scalability to more advanced future systems in mind.

AI safety problems are too important for the discussion to be derailed by status contests like “my issue is better than yours”. This kind of false dichotomy is itself a distraction from the shared goal of ensuring AI has a positive impact on the world, both now and in the future. People who care about the safety of current and future AI systems are natural allies – let’s support each other on the path towards this common goal.

This article originally appeared on the Deep Safety blog.

MIRI’s January 2018 Newsletter

Our 2017 fundraiser was a huge success, with 341 donors contributing a total of $2.5 million!

Some of the largest donations came from Ethereum inventor Vitalik Buterin, bitcoin investors Christian Calderon and Marius van Voorden, poker players Dan Smith and Tom and Martin Crowley (as part of a matching challenge), and the Berkeley Existential Risk Initiative. Thank you to everyone who contributed!

Research updates

General updates

News and links

Rewinding the Doomsday Clock

On Thursday, the Bulletin of Atomic Scientists inched their iconic Doomsday Clock forward another thirty seconds. It is now two minutes to midnight.

Citing the growing threats of climate change, increasing tensions between nuclear-armed countries, and a general loss of trust in government institutions, the Bulletin warned that we are “making the world security situation more dangerous than it was a year ago—and as dangerous as it has been since World War II.”

The Doomsday Clock hasn’t fallen this close to midnight since 1953, a year after the US and Russia tested the hydrogen bomb, a bomb up to 1000 times more powerful than the bombs dropped on Hiroshima and Nagasaki. And like 1953, this year’s announcement highlighted the increased global tensions around nuclear weapons.

As the Bulletin wrote in their statement, “To call the world nuclear situation dire is to understate the danger—and its immediacy.”

Between the US, Russia, North Korea, and Iran, the threats of aggravated nuclear war and accidental nuclear war both grew in 2017. As former Secretary of Defense William Perry said in a statement, “The events of the past year have only increased my concern that the danger of a nuclear catastrophe is increasingly real. We are failing to learn from the lessons of history as we find ourselves blundering headfirst towards a second cold war.”

The threat of nuclear war has hovered in the background since the weapons were invented, but with the end of the Cold War, many were pulled into what now appears to have been a false sense of security. In the last year, aggressive language and plans for new and upgraded nuclear weapons have reignited fears of nuclear armageddon. The recent false missile alerts in Hawaii and Japan were perhaps the starkest reminders of how close nuclear war feels, and how destructive it would be. 

 

But the nuclear threat isn’t all the Bulletin looks at. 2017 also saw the growing risk of climate change, a breakdown of trust in government institutions, and the emergence of new technological threats.

Climate change won’t hit humanity as immediately as nuclear war, but with each year that the international community fails to drastically reduce carbon fossil fuel emissions, the threat of catastrophic climate change grows. In 2017, the US pulled out of the Paris Climate Agreement and global carbon emissions grew 2% after a two-year plateau. Meanwhile, NASA and NOAA confirmed that the past four years are the hottest four years they’ve ever recorded.

For emerging technological risks, such as widespread cyber attacks, the development of autonomous weaponry, and potential misuse of synthetic biology, the Bulletin calls for the international community to work together. They write, “world leaders also need to seek better collective methods of managing those advances, so the positive aspects of new technologies are encouraged and malign uses discovered and countered.”

Pointing to disinformation campaigns and “fake news”, the Bulletin’s Science and Security Board writes that they are “deeply concerned about the loss of public trust in political institutions, in the media, in science, and in facts themselves—a loss that the abuse of information technology has fostered.”

 

Turning Back the Clock

The Doomsday Clock is a poignant symbol of the threats facing human civilization, and it received broad media attention this week through British outlets like The Guardian and The Independent, Australian outlets such as ABC Online, and American outlets from Fox News to The New York Times.

“[The clock] is a tool,” explains Lawrence Krauss, a theoretical physicist at Arizona State University and member of the Bulletin’s Science and Security Board. “For one day a year, there are thousands of newspaper stories about the deep, existential threats that humanity faces.”

The Bulletin ends its report with a list of priorities to help turn back the Clock, chocked full of suggestions for government and industrial leaders. But the authors also insist that individual citizens have a crucial role in tackling humanity’s greatest risks.

“Leaders react when citizens insist they do so,” the authors explain. “Citizens around the world can use the power of the internet to improve the long-term prospects of their children and grandchildren. They can insist on facts, and discount nonsense. They can demand action to reduce the existential threat of nuclear war and unchecked climate change. They can seize the opportunity to make a safer and saner world.”

You can read the Bulletin’s full report here.

AI Should Provide a Shared Benefit for as Many People as Possible

Shared Benefit Principle: AI technologies should benefit and empower as many people as possible.

Today, the combined wealth of the eight richest people in the world is greater than that of the poorest half of the global population. That is, 8 people have more than the combined wealth of 3,600,000,000 others.

This is already an extreme example of income inequality, but if we don’t prepare properly for artificial intelligence, the situation could get worse. In addition to the obvious economic benefits that would befall whoever designs advanced AI first, those who profit from AI will also likely have: access to better health care, happier and longer lives, more opportunities for their children, various forms of intelligence enhancement, and so on.

A Cultural Shift

Our approach to technology so far has been that whoever designs it first, wins — and they win big. In addition to the fabulous wealth an inventor can accrue, the creator of a new technology also assumes complete control over the product and its distribution. This means that an invention or algorithm will only benefit those whom the creator wants it to benefit. While this approach may have worked with previous inventions, many are concerned that advanced AI will be so powerful that we can’t treat it as business-as-usual.

What if we could ensure that as AI is developed we all benefit? Can we make a collective — and pre-emptive — decision to use AI to help raise up all people, rather than just a few?

Joshua Greene, a professor of psychology at Harvard, explains his take on this Principle: “We’re saying in advance, before we know who really has it, that this is not a private good. It will land in the hands of some private person, it will land in the hands of some private company, it will land in the hands of some nation first. But this principle is saying, ‘It’s not yours.’ That’s an important thing to say because the alternative is to say that potentially, the greatest power that humans ever develop belongs to whoever gets it first.”

AI researcher Susan Craw also agreed with the Principle, and she further clarified it.

“That’s definitely a yes,” Craw said, “But it is AI technologies plural, when it’s taken as a whole. Rather than saying that a particular technology should benefit lots of people, it’s that the different technologies should benefit and empower people.”

The Challenge of Implementation

However, as is the case with all of the Principles, agreeing with them is one thing; implementing them is another. John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, considered how the Shared Benefit Principle would ultimately need to be modified so that the new technologies will benefit both developed and developing countries alike.

“Yes, it’s great,” Havens said of the Principle, before adding, “if you can put a comma after it, and say … something like, ‘issues of wealth, GDP, notwithstanding.’ The point being, what this infers is whatever someone can afford, it should still benefit them.”

Patrick Lin, a philosophy professor at California Polytechnic State University, was even more concerned about how the Principle might be implemented, mentioning the potential for unintended consequences.

Lin explained: “Shared benefit is interesting, because again, this is a principle that implies consequentialism, that we should think about ethics as satisfying the preferences or benefiting as many people as possible. That approach to ethics isn’t always right. … Consequentialism often makes sense, so weighing these pros and cons makes sense, but that’s not the only way of thinking about ethics. Consequentialism could fail you in many cases. For instance, consequentialism might green-light torturing or severely harming a small group of people if it gives rise to a net increase in overall happiness to the greater community.”

“That’s why I worry about the … Shared Benefit Principle,” Lin continued. “[It] makes sense, but [it] implicitly adopts a consequentialist framework, which by the way is very natural for engineers and technologists to use, so they’re very numbers-oriented and tend to think of things in black and white and pros and cons, but ethics is often squishy. You deal with these squishy, abstract concepts like rights and duties and obligations, and it’s hard to reduce those into algorithms or numbers that could be weighed and traded off.”

As we move from discussing these Principles as ideals to implementing them as policy, concerns such as those that Lin just expressed will have to be addressed, keeping possible downsides of consequentialism and utilitarianism in mind.

The Big Picture

The devil will always be in the details. As we consider how we might shift cultural norms to prevent all benefits going only to the creators of new technologies — as well as considering the possible problems that could arise if we do so — it’s important to remember why the Shared Benefit Principle is so critical. Roman Yampolskiy, an AI researcher at the University of Louisville, sums this up:

“Early access to superior decision-making tools is likely to amplify existing economic and power inequalities turning the rich into super-rich, permitting dictators to hold on to power and making oppositions’ efforts to change the system unlikely to succeed. Advanced artificial intelligence is likely to be helpful in medical research and genetic engineering in particular making significant life extension possible, which would remove one the most powerful drivers of change and redistribution of power – death. For this and many other reasons, it is important that AI tech should be beneficial and empowering to all of humanity, making all of us wealthier and healthier.”

What Do You Think?

How important is the Shared Benefit Principle to you? How can we ensure that the benefits of new AI technologies are spread globally, rather than remaining with only a handful of people who developed them? How can we ensure that we don’t inadvertently create more problems in an effort to share the benefits of AI?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Deep Safety: NIPS 2017 Report

This year’s NIPS gave me a general sense that near-term AI safety is now mainstream and long-term safety is slowly going mainstream. On the near-term side, I particularly enjoyed Kate Crawford’s keynote on neglected problems in AI fairness, the ML security workshops, and the Interpretable ML symposium debate that addressed the “do we even need interpretability?” question in a somewhat sloppy but entertaining way. There was a lot of great content on the long-term side, including several oral / spotlight presentations and the Aligned AI workshop.

Value alignment papers

Inverse Reward Design (Hadfield-Menell et al) defines the problem of an RL agent inferring a human’s true reward function based on the proxy reward function designed by the human. This is different from inverse reinforcement learning, where the agent infers the reward function from human behavior. The paper proposes a method for IRD that models uncertainty about the true reward, assuming that the human chose a proxy reward that leads to the correct behavior in the training environment. For example, if a test environment unexpectedly includes lava, the agent assumes that a lava-avoiding reward function is as likely as a lava-indifferent or lava-seeking reward function, since they lead to the same behavior in the training environment. The agent then follows a risk-averse policy with respect to its uncertainty about the reward function.

ird

The paper shows some encouraging results on toy environments for avoiding some types of side effects and reward hacking behavior, though it’s unclear how well they will generalize to more complex settings. For example, the approach to reward hacking relies on noticing disagreements between different sensors / features that agreed in the training environment, which might be much harder to pick up on in a complex environment. The method is also at risk of being overly risk-averse and avoiding anything new, whether it be lava or gold, so it would be great to see some approaches for safe exploration in this setting.

Repeated Inverse RL (Amin et al) defines the problem of inferring intrinsic human preferences that incorporate safety criteria and are invariant across many tasks. The reward function for each task is a combination of the task-invariant intrinsic reward (unobserved by the agent) and a task-specific reward (observed by the agent). This multi-task setup helps address the identifiability problem in IRL, where different reward functions could produce the same behavior.

repeated irl

The authors propose an algorithm for inferring the intrinsic reward while minimizing the number of mistakes made by the agent. They prove an upper bound on the number of mistakes for the “active learning” case where the agent gets to choose the tasks, and show that a certain number of mistakes is inevitable when the agent cannot choose the tasks (there is no upper bound in that case). Thus, letting the agent choose the tasks that it’s trained on seems like a good idea, though it might also result in a selection of tasks that is less interpretable to humans.

Deep RL from Human Preferences (Christiano et al) uses human feedback to teach deep RL agents about complex objectives that humans can evaluate but might not be able to demonstrate (e.g. a backflip). The human is shown two trajectory snippets of the agent’s behavior and selects which one more closely matches the objective. This method makes very efficient use of limited human feedback, scaling much better than previous methods and enabling the agent to learn much more complex objectives (as shown in MuJoCo and Atari).

qbert_trimmed

Dynamic Safe Interruptibility for Decentralized Multi-Agent RL (El Mhamdi et al) generalizes the safe interruptibility problem to the multi-agent setting. Non-interruptible dynamics can arise in a group of agents even if each agent individually is indifferent to interruptions. This can happen if Agent B is affected by interruptions of Agent A and is thus incentivized to prevent A from being interrupted (e.g. if the agents are self-driving cars and A is in front of B on the road). The multi-agent definition focuses on preserving the system dynamics in the presence of interruptions, rather than on converging to an optimal policy, which is difficult to guarantee in a multi-agent setting.

Aligned AI workshop

This was a more long-term-focused version of the Reliable ML in the Wild workshop held in previous years. There were many great talks and posters there – my favorite talks were Ian Goodfellow’s “Adversarial Robustness for Aligned AI” and Gillian Hadfield’s “Incomplete Contracting and AI Alignment”.

Ian made the case of ML security being important for long-term AI safety. The effectiveness of adversarial examples is problematic not only from the near-term perspective of current ML systems (such as self-driving cars) being fooled by bad actors. It’s also bad news from the long-term perspective of aligning the values of an advanced agent, which could inadvertently seek out adversarial examples for its reward function due to Goodhart’s law. Relying on the agent’s uncertainty about the environment or human preferences is not sufficient to ensure safety, since adversarial examples can cause the agent to have arbitrarily high confidence in the wrong answer.

ian talk_3

Gillian approached AI safety from an economics perspective, drawing parallels between specifying objectives for artificial agents and designing contracts for humans. The same issues that make contracts incomplete (the designer’s inability to consider all relevant contingencies or precisely specify the variables involved, and incentives for the parties to game the system) lead to side effects and reward hacking for artificial agents.

Gillian talk_4

The central question of the talk was how we can use insights from incomplete contracting theory to better understand and systematically solve specification problems in AI safety, which is a really interesting research direction. The objective specification problem seems even harder to me than the incomplete contract problem, since the contract design process relies on some level of shared common sense between the humans involved, which artificial agents do not currently possess.

Interpretability for AI safety

I gave a talk at the Interpretable ML symposium on connections between interpretability and long-term safety, which explored what forms of interpretability could help make progress on safety problems (slidesvideo). Understanding our systems better can help ensure that safe behavior generalizes to new situations, and it can help identify causes of unsafe behavior when it does occur.

For example, if we want to build an agent that’s indifferent to being switched off, it would be helpful to see whether the agent has representations that correspond to an off-switch, and whether they are used in its decisions. Side effects and safe exploration problems would benefit from identifying representations that correspond to irreversible states (like “broken” or “stuck”). While existing work on examining the representations of neural networks focuses on visualizations, safety-relevant concepts are often difficult to visualize.

Local interpretability techniques that explain specific predictions or decisions are also useful for safety. We could examine whether features that are idiosyncratic to the training environment or indicate proximity to dangerous states influence the agent’s decisions. If the agent can produce a natural language explanation of its actions, how does it explain problematic behavior like reward hacking or going out of its way to disable the off-switch?

There are many ways in which interpretability can be useful for safety. Somewhat less obvious is what safety can do for interpretability: serving as grounding for interpretability questions. As exemplified by the final debate of the symposium, there is an ongoing conversation in the ML community trying to pin down the fuzzy idea of interpretability – what is it, do we even need it, what kind of understanding is useful, etc. I think it’s important to keep in mind that our desire for interpretability is to some extent motivated by our systems being fallible – understanding our AI systems would be less important if they were 100% robust and made no mistakes. From the safety perspective, we can define interpretability as the kind of understanding that help us ensure the safety of our systems.

For those interested in applying the interpretability hammer to the safety nail, or working on other long-term safety questions, FLI has recently announced a new grant program. Now is a great time for the AI field to think deeply about value alignment. As Pieter Abbeel said at the end of his keynote, “Once you build really good AI contraptions, how do you make sure they align their value system with our value system? Because at some point, they might be smarter than us, and it might be important that they actually care about what we care about.”

(Thanks to Janos Kramar for his feedback on this post, and to everyone at DeepMind who gave feedback on the interpretability talk.)

This article was originally posted here.

Research for Beneficial Artificial Intelligence

Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial intelligence.

It’s no coincidence that the first Asilomar Principle is about research. On the face of it, the Research Goal Principle may not seem as glamorous or exciting as some of the other Principles that more directly address how we’ll interact with AI and the impact of superintelligence. But it’s from this first Principle that all of the others are derived.

Simply put, without AI research and without specific goals by researchers, AI cannot be developed. However, participating in research and working toward broad AI goals without considering the possible long-term effects of the research could be detrimental to society.

There’s a scene in Jurassic Park, in which Jeff Goldblum’s character laments that the scientists who created the dinosaurs “were so preoccupied with whether or not they could that they didn’t stop to think if they should.” Until recently, AI researchers have also focused primarily on figuring out what they could accomplish, without longer-term considerations, and for good reason: scientists were just trying to get their AI programs to work at all, and the results were far too limited to pose any kind of threat.

But in the last few years, scientists have made great headway with artificial intelligence. The impacts of AI on society are already being felt, and as we’re seeing with some of the issues of bias and discrimination that are already popping up, this isn’t always good.

Attitude Shift

Unfortunately, there’s still a culture within AI research that’s too accepting of the idea that the developers aren’t responsible for how their products are used. Stuart Russell compares this attitude to that of civil engineers, who would never be allowed to say something like, “I just design the bridge; someone else can worry about whether it stays up.”

Joshua Greene, a psychologist from Harvard, agrees. He explains:

“I think that is a bookend to the Common Good Principle [#23] – the idea that it’s not okay to be neutral. It’s not okay to say, ‘I just make tools and someone else decides whether they’re used for good or ill.’ If you’re participating in the process of making these enormously powerful tools, you have a responsibility to do what you can to make sure that this is being pushed in a generally beneficial direction. With AI, everyone who’s involved has a responsibility to be pushing it in a positive direction, because if it’s always somebody else’s problem, that’s a recipe for letting things take the path of least resistance, which is to put the power in the hands of the already powerful so that they can become even more powerful and benefit themselves.”

What’s Beneficial?

Other AI experts I spoke with agreed with the general idea of the Principle, but didn’t see quite eye-to-eye on how it was worded. Patrick Lin, for example was concerned about the use of the word “beneficial” and what it meant, while John Havens appreciated the word precisely because it forces us to consider what “beneficial” means in this context.

“I generally agree with this research goal,” explained Lin, a philosopher at Cal Poly. “Given the potential of AI to be misused or abused, it’s important to have a specific positive goal in mind. I think where it might get hung up is what this word ‘beneficial’ means. If we’re directing it towards beneficial intelligence, we’ve got to define our terms; we’ve got to define what beneficial means, and that to me isn’t clear. It means different things to different people, and it’s rare that you could benefit everybody.”

Meanwhile, Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, was pleased the word forced the conversation.

“I love the word beneficial,” Havens said. “I think sometimes inherently people think that intelligence, in one sense, is always positive. Meaning, because something can be intelligent, or autonomous, and that can advance technology, that that is a ‘good thing’. Whereas the modifier ‘beneficial’ is excellent, because you have to define: What do you mean by beneficial? And then, hopefully, it gets more specific, and it’s: Who is it beneficial for? And, ultimately, what are you prioritizing? So I love the word beneficial.”

AI researcher Susan Craw, a professor at Robert Gordon University, also agrees with the Principle but questioned the order of the phrasing.

“Yes, I agree with that,” Craw said, but adds, “I think it’s a little strange the way it’s worded, because of ‘undirected.’ It might even be better the other way around, which is, it would be better to create beneficial research, because that’s a more well-defined thing.”

Long-term Research

Roman Yampolskiy, an AI researcher at the University of Louisville, brings the discussion back to the issues of most concern for FLI:

“The universe of possible intelligent agents is infinite with respect to both architectures and goals. It is not enough to simply attempt to design a capable intelligence, it is important to explicitly aim for an intelligence that is in alignment with goals of humanity. This is a very narrow target in a vast sea of possible goals and so most intelligent agents would not make a good optimizer for our values resulting in a malevolent or at least indifferent AI (which is likewise very dangerous). It is only by aligning future superintelligence with our true goals, that we can get significant benefit out of our intellectual heirs and avoid existential catastrophe.”

And with that in mind, we’re excited to announce we’ve launched a new round of grants! If you haven’t seen the Request for Proposals (RFP) yet, you can find it here. The focus of this RFP is on technical research or other projects enabling development of AI that is beneficial to society, and robust in the sense that the benefits are somewhat guaranteed: our AI systems must do what we want them to do.

If you’re a researcher interested in the field of AI, we encourage you to review the RFP and consider applying.

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

MIRI’s December 2017 Newsletter and Annual Fundraiser

Our annual fundraiser is live. Discussed in the fundraiser post:

  • News  — What MIRI’s researchers have been working on lately, and more.
  • Goals — We plan to grow our research team 2x in 2018–2019. If we raise $850k this month, we think we can do that without dipping below a 1.5-year runway.
  • Actual goals — A bigger-picture outline of what we think is the likeliest sequence of events that could lead to good global outcomes.

Our funding drive will be running until December 31st.

Research updates

General updates

When Should Machines Make Decisions?

Human Control: Humans should choose how and whether to delegate decisions to AI systems, to accomplish human-chosen objectives.

When is it okay to let a machine make a decision instead of a person? Most of us allow Google Maps to choose the best route to a new location. Many of us are excited to let self-driving cars take us to our destinations while we work or daydream. But are you ready to let your car choose your destination for you? The car might recognize that your ultimate objective is to eat or to shop or to run some errand, but most of the time, we have specific stores or restaurants that we want to go to, and we may not want the vehicle making those decisions for us.

What about more challenging decisions? Should weapons be allowed to choose who to kill? If so, how do they make that choice? And how do we address the question of control when artificial intelligence becomes much smarter than people? If an AI knows more about the world and our preferences than we do, would it be better if the AI made all of our decisions for us?

Questions like these are not easy to address. In fact, two of the AI experts I interviewed responded to this Principle with comments like, “Yeah, this is tough,” and “Right, that’s very, very tricky.”

And everyone I talked to agreed that this question of human control taps into some of the most challenging problems facing the design of AI.

“I think this is hugely important,” said Susan Craw, a Research Professor at Robert Gordon University Aberdeen. “Otherwise you’ll have systems wanting to do things for you that you don’t necessarily want them to do, or situations where you don’t agree with the way that systems are doing something.”

What does human control mean?

Joshua Greene, a psychologist at Harvard, cut right to the most important questions surrounding this Principle.

“This is an interesting one because it’s not clear what it would mean to violate that rule,” Greene explained. “What kind of decision could an AI system make that was not in some sense delegated to the system by a human? AI is a human creation. This principle, in practice, is more about what specific decisions we consciously choose to let the machines make. One way of putting it is that we don’t mind letting the machines make decisions, but whatever decisions they make, we want to have decided that they are the ones making those decisions.

“In, say, a navigating robot that walks on legs like a human, the person controlling it is not going to decide every angle of every movement. The humans won’t be making decisions about where exactly each foot will land, but the humans will have said, ‘I’m comfortable with the machine making those decisions as long as it doesn’t conflict with some other higher level command.’”

Roman Yampolskiy, an AI researcher at the University of Louisville, suggested that we might be even closer to giving AI decision-making power than many realize.

“In many ways we have already surrendered control to machines,” Yampolskiy said. “AIs make over 85% of all stock trades, control operation of power plants, nuclear reactors, electric grid, traffic light coordination and in some cases military nuclear response aka “dead hand.” Complexity and speed required to meaningfully control those sophisticated processes prevent meaningful human control. We are simply not quick enough to respond to ultrafast events, such as those in algorithmic trading and more and more seen in military drones. We are also not capable enough to keep thousands of variables in mind or to understand complicated mathematical models. Our reliance on machines will only increase but as long as they make good decisions (decisions we would make if we were smart enough, had enough data and enough time) we are OK with them making such decisions. It is only in cases where machine decisions diverge from ours that we would like to be able to intervene. Of course figuring out cases in which we diverge is exactly the unsolved Value Alignment Problem.”

Greene also elaborated on this idea: “The worry is when you have machines that are making more complicated and consequential decisions than ‘where do to put the next footstep.’ When you have a machine that can behave in an open-ended flexible way, how do you delegate anything without delegating everything? When you have someone who works for you and you have some problem that needs to be solved and you say, ‘Go figure it out,’ you don’t specify, ‘But don’t murder anybody in the process. Don’t break any laws and don’t spend all the company’s money trying to solve this one small-sized problem.’ There are assumptions in the background that are unspecified and fairly loose, but nevertheless very important.

“I like the spirit of this principle. It’s a specification of what follows from the more general idea of responsibility, that every decision is either made by a person or specifically delegated to the machine. But this one will be especially hard to implement once AI systems start behaving in more flexible, open-ended ways.”

Trust and Responsibility

AI is often compared to a child, both in terms of what level of learning a system has achieved and also how the system is learning. And just as we would be with a child, we’re hesitant to give a machine too much control until it’s proved it can be trusted to be safe and accountable. Artificial intelligence systems may have earned our trust when it comes to maps, financial trading, and the operation of power grids, but some question whether this trend can continue as AI systems become even more complex or when safety and well-being are at greater risk.

John Havens, the Executive Director of The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems, explained, “Until universally systems can show that humans can be completely out of the loop and more often than not it will be beneficial, then I think humans need to be in the loop.”

“However, the research I’ve seen also shows that right now is the most dangerous time, where humans are told, ‘Just sit there, the system works 99% of the time, and we’re good.’ That’s the most dangerous situation,” he added, in reference to recent research that has found people stop paying attention if a system, like a self-driving car, rarely has problems. The research indicates that when problems do arise, people struggle to refocus and address the problem.

“I think it still has to be humans delegating first,” Havens concluded.

In addition to the issues already mentioned with decision-making machines, Patrick Lin, a philosopher at California Polytechnic State University, doesn’t believe it’s clear who would be held responsible if something does go wrong.

“I wouldn’t say that you must always have meaningful human control in everything you do,” Lin said. “I mean, it depends on the decision, but also I think this gives rise to new challenges. … This is related to the idea of human control and responsibility. If you don’t have human control, it could be unclear who’s responsible … the context matters. It really does depend on what kind of decisions we’re talking about, that will help determine how much human control there needs to be.”

Susan Schneider, a philosopher at the University of Connecticut, also worried about how these problems could be exacerbated if we achieve superintelligence.

“Even now it’s sometimes difficult to understand why a deep learning system made the decisions that it did,” she said, adding later, “If we delegate decisions to a system that’s vastly smarter than us, I don’t know how we’ll be able to trust it, since traditional methods of verification seem break down.”

What do you think?

Should humans be in control of a machine’s decisions at all times? Is that even possible? When is it appropriate for a machine to take over, and when do we need to make sure a person is “awake at the wheel,” so to speak? There are clearly times when machines are more equipped to safely address a situation than humans, but is that all that matters? When are you comfortable with a machine making decisions for you, and when would you rather remain in control?

This article is part of a series on the 23 Asilomar AI Principles. The Principles offer a framework to help artificial intelligence benefit as many people as possible. But, as AI expert Toby Walsh said of the Principles, “Of course, it’s just a start. … a work in progress.” The Principles represent the beginning of a conversation, and now we need to follow up with broad discussion about each individual principle. You can read the discussions about previous principles here.

Help Support FLI This Giving Tuesday

We’ve accomplished a lot. FLI has only been around for a few years, but during that time, we’ve:

  • Helped mainstream AI safety research,
  • Funded 37 AI safety research grants,
  • Launched multiple open letters that have brought scientists and the public together for the common cause of a beneficial future,
  • Drafted the 23 Asilomar Principles which offer guidelines for ensuring that AI is developed beneficially for all,
  • Supported the successful efforts by the International Campaign to Abolish Nuclear Weapons (ICAN) to get a treaty UN treaty passed that bans and stigmatizes nuclear weapons (ICAN won this year’s Nobel Peace Prize for their work),
  • Supported efforts to advance negotiations toward a ban on lethal autonomous weapons with a video that’s been viewed over 30 millions times,
  • Launched a website that’s received nearly 3 million page views,
  • Broadened the conversation about how humanity can flourish rather than flounder with powerful technologies.

But that’s just the beginning. There’s so much more we’d like to do, but we need your help. On Giving Tuesday this year, please consider a donation to FLI.

Where would your money go?

  • More AI safety research,
  • More high-quality information and communication about AI safety,
  • More efforts to keep the future safe from lethal autonomous weapons,
  • More efforts to trim excess nuclear stockpiles & reduce nuclear war risk,
  • More efforts to guarantee a future we can all look forward to.

Please Consider a Donation to Support FLI

Harvesting Water Out of Thin Air: A Solution to Water Shortage Crisis?

The following post was written by Jung Hyun Claire Park.

One in nine people around the world do not have access to clean water.  As the global population increases and climate heats up, experts fear water shortages will increase. To address this anticipated crisis, scientists are turning to a natural reserve of fresh water that has yet to be exploited: the atmosphere.

The atmosphere is estimated to contain 13 trillion liters of water vapor and droplets, which could significantly contribute to resolving the water shortage problem. However, a number of attempts have already been made to collect water from air. Previously, researchers have used porous materials such as zeolites, silica gel, and clay to capture water molecules, but these approaches suffered from several limitations. First, the aforementioned materials work efficiently only in high-humidity condition. Yet it’s low-humidity areas, like sub-Saharan Africa, which are in greatest need of clean drinking water. Another limitation is that these materials tend to cling too tightly to the water molecules they collect. Thus, these previous methods of collecting water from air have required high energy consumption to release the absorbed water, diminishing their viability as a solution to the water shortage crisis.

Now, Dr. Omar Yaghi and a team of scientists at Massachusetts Institute of Technology and the University of California Berkeley have developed a new technology that provides a solution to these limitations. The technology uses a material called a metal-organic framework (MOF) that effectively captures water molecules at low-humidity levels. And the only energy necessary to release drinkable water from the MOFs can be harnessed from ambient sunlight.

How Does This System Work?

MOFs belong to a family of porous compounds whose sponge-like configuration is ideal for trapping molecules. The MOFs can be easily modified at the molecular level to meet various needs, and they are highly customizable. Researchers can modify the type of molecule that’s absorbed, the optimal humidity level for maximum absorption, and the energy required to release trapped molecules — thus yielding a plethora of potential MOF variations. The proposed water harvesting technology uses a hydrophilic variation of MOFs called microcrystalline powder MOF-801. This variation is engineered to more efficiently harvest water from an atmosphere in which the relative humidity level as low as 20% — the typical level found in the world’s driest regions. Furthermore, the MOF-801 only requires energy from ambient sunlight to relinquish its collected water, which means the energy necessary for this technology is abundant in precisely those desert areas with the most severely limited supply of fresh water.  MOF-801 overcomes most, if not all, of the limitations found in the materials that were previously proposed for harvesting water from air.

A Schematic of a metal-organic framework (MOF). The yellow balls represent the porous space where molecules are captured. The lines are organic linkers, and the blue intersections are metal ions. UC Berkeley, Berkeley Lab image

The prototype is shaped like a rectangular prism and it operates through a simple mechanism. To collect water from the atmosphere, MOF is pressed into a thin sheet of copper metal and placed under the solar absorber located on top of the prism. The condenser plate is placed at the bottom and is kept at room temperature. Once the top layer absorbs solar heat, water is released from the MOF and collected in the cooler bottom layer due to concentration and temperature difference. Tests showed that one kilogram (about 2 pounds) of MOF can collect about 2.8L of water per day. Yaghi notes that since the technology collects distilled water, all that’s needed is the addition of mineral ions. He suggests that one kilogram of MOF will be able to produce enough drinkable water per day for a person living in some of the driest regions on earth.

Image of a water harvesting prototype with MOF-801 with outer dimension of 7cm by 7cm x 4.5cm. MIT.

Why This Technology Is Promising

The promise of this technology mostly lies in its sustainability. Water can be pulled from the air without any energy input beyond that which can be collected from the ambient sunlight. In addition, MOF-801 is a zirconium-based compound that is widely available for a low cost. And the technology has a long-life span: Yaghi predicts that the MOF will last through at least 100,000 cycles of water absorption and desorption, and thus it does not require frequent replacement. Plus, the water harvesting technology employing MOF isn’t limited to drinking water. It could be used for any service requiring water, such as agriculture. Yaghi believes that this water harvesting technology could pose a viable solution for water shortage problems in various regions of the world.

Yaghi also anticipates that the material itself could be used for the separation, storage, and catalysis of molecules other than water as well. For instance, MOF can be tailored to capture carbon emissions before those emissions reach the atmosphere. Or they may be designed to remove existing CO2 from the atmosphere. MOF, as the name suggests, is simply a framework, and thus it has opened up many opportunities for modification to suit practical needs.

Future of Water Harvesting Technology

The team of researchers from Berkeley and MIT are currently pushing to test the water harvesting technology in real-life settings in regions with low humidity levels. Yaghi remarked that his ultimate goal would be to “have drinking water widely available, especially in areas that lack clean water.” He envisions providing water to villages that are “off-grid,” where each household will have a machine and create their own “personalized water.” And he hopes his envisioned future may not be too far away.

AI Researchers Create Video to Call for Autonomous Weapons Ban at UN

In response to growing concerns about autonomous weapons, a coalition of AI researchers and advocacy organizations released a fictitious video on Monday that depicts a disturbing future in which lethal autonomous weapons have become cheap and ubiquitous.

The video was launched in Geneva, where AI researcher Stuart Russell presented it at an event at the United Nations Convention on Conventional Weapons hosted by the Campaign to Stop Killer Robots.

Russell, in an appearance at the end of the video, warns that the technology described in the film already exists and that the window to act is closing fast.

Support for a ban has been mounting. Just this past week, over 200 Canadian scientists and over 100 Australian scientists in academia and industry penned open letters to Prime Minister Justin Trudeau and Malcolm Turnbull urging them to support the ban. Earlier this summer, over 130 leaders of AI companies signed a letter in support of this week’s discussions. These letters follow a 2015 open letter released by the Future of Life Institute and signed by more than 20,000 AI/Robotics researchers and others, including Elon Musk and Stephen Hawking.

These letters indicate both grave concern and a sense that the opportunity to curtail lethal autonomous weapons is running out.

Noel Sharkey of the International Committee for Robot Arms Control explains, “The Campaign to Stop Killer Robots is not trying to stifle innovation in artificial intelligence and robotics and it does not wish to ban autonomous systems in the civilian or military world. Rather we see an urgent need to prevent automation of the critical functions for selecting targets and applying violent force without human deliberation and to ensure meaningful human control for every attack.”

Drone technology today is very close to having fully autonomous capabilities. And many of the world’s leading AI researchers worry that if these autonomous weapons are ever developed, they could dramatically lower the threshold for armed conflict, ease and cheapen the taking of human life, empower terrorists, and create global instability. The US and other nations have used drones and semi-automated systems to carry out attacks for several years now, but fully removing a human from the loop is at odds with international humanitarian and human rights law.

A ban can exert great power on the trajectory of technological development without needing to stop every instance of misuse. Max Tegmark, MIT Professor and co-founder of the Future of Life Institute, points out, “People’s knee-jerk reaction that bans can’t help isn’t historically accurate: the bioweapon ban created such a powerful stigma that, despite treaty cheating, we have almost no bioterror attacks today and almost all biotech funding is civilian.”

As Toby Walsh, an AI professor at the University of New South Wales, argues: “The academic community has sent a clear and consistent message. Autonomous weapons will be weapons of terror, the perfect tool for those who have no qualms about the terrible uses to which they are put. We need to act now before this future arrives.”

More than 70 countries are participating in the meeting taking place November 13 – 17 organized by the 2016 Fifth Review Conference at the UN, which established a Group of Governmental Experts on lethal autonomous weapons. The meeting is chaired by Ambassador Amandeep Singh Gill of India, and the countries will continue negotiations of what could become an historic international treaty.

For more information about autonomous weapons, see the following resources:

Developing Ethical Priorities for Neurotechnologies and AI

Private companies and military sectors have moved beyond the goal of merely understanding the brain to that of augmenting and manipulating brain function. In particular, companies such as Elon Musk’s Neuralink and Bryan Johnson’s Kernel are hoping to harness advances in computing and artificial intelligence alongside neuroscience to provide new ways to merge our brains with computers.

Musk also sees this as a means to help address both AI safety and human relevance as algorithms outperform humans in one area after another. He has previously stated, “Some high bandwidth interface to the brain will be something that helps achieve a symbiosis between human and machine intelligence and maybe solves the control problem and the usefulness problem.”

In a comment in Nature, 27 people from The Morningside Group outlined four ethical priorities for the emerging space of neurotechnologies and artificial intelligence. The authors include neuroscientists, ethicists and AI engineers from Google, top US and global Universities, and several non-profit research organizations such as AI Now and The Hastings Center.

A Newsweek article describes their concern, “Artificial intelligence could hijack brain-computer interfaces and take control of our minds.” While this is not exactly the warning the Group describes, they do suggest we are in store for some drastic changes:

…we are on a path to a world in which it will be possible to decode people’s mental processes and directly manipulate the brain mechanisms underlying their intentions, emotions and decisions; where individuals could communicate with others simply by thinking; and where powerful computational systems linked directly to people’s brains aid their interactions with the world such that their mental and physical abilities are greatly enhanced.

The authors suggest that although these advances could provide meaningful and beneficial enhancements to the human experience, they could also exacerbate social inequalities, enable more invasive forms of social manipulation, and threaten core fundamentals of what it means to be human. They encourage readers to consider the ramifications of these emerging technologies now.

Referencing the Asilomar AI Principles and other ethical guidelines as a starting point, they call for a new set of guidelines that specifically address concerns that will emerge as groups like Elon Musk’s startup Neuralink and other companies around the world explore ways to improve the interface between brains and machines. Their recommendations cover four key areas: privacy and consent; agency and identity; augmentation; and bias.

Regarding privacy and consent, they posit that the right to keep neural data private is critical. To this end, they recommend opt-in policies, strict regulation of commercial entities, and the use of blockchain-based techniques to provide transparent control over the use of data. In relation to agency and identity, they recommend that bodily and mental integrity, as well as the ability to choose our actions, be enshrined in international treaties such as the Universal Declaration of Human Rights.

In the area of augmentation, the authors discuss the possibility of an augmentation arms race of soldiers in the pursuit of so-called “super-soldiers” that are more resilient to combat conditions. They recommend that the use of neural technology for military purposes be stringently regulated. And finally, they recommend the exploration of countermeasures, as well as diversity in the design process, in order to prevent widespread bias in machine learning applications.

The ways in which AI will increasingly connect with our bodies and brains pose challenging safety and ethical concerns that will require input from a vast array of people. As Dr. Rafael Yuste of Columbia University, a neuroscientist who co-authored the essay, told STAT, “the ethical thinking has been insufficient. Science is advancing to the point where suddenly you can do things you never would have thought possible.”

MIRI’s November 2017 Newsletter

Eliezer Yudkowsky has written a new book on civilizational dysfunction and outperformance: Inadequate Equilibria: Where and How Civilizations Get Stuck. The full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visit equilibriabook.com.

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. Inadequacy and Modesty (discussion: LessWrong, EA Forum, Hacker News)
  2. An Equilibrium of No Free Energy (discussion: LessWrong, EA Forum)

Research updates

General updates

News and links

Scientists to Congress: The Iran Deal is a Keeper

The following article was written by Dr. Lisbeth Gronlund and originally posted on the Union of Concerned Scientists blog.

The July 2015 Iran Deal, which places strict, verified restrictions on Iran’s nuclear activities, is again under attack by President Trump. This time he’s kicked responsibility over to Congress to “fix” the agreement and promised that if Congress fails to do so, he will withdraw from it.

As the New York Times reported, in response to this development over 90 prominent scientists sent a letter to leading members of Congress yesterday urging them to support the Iran Deal—making the case that continued US participation will enhance US security.

Many of these scientists also signed a letter strongly supporting the Iran Deal to President Obama in August 2015, as well as a letter to President-elect Trump in January. In all three cases, the first signatory is Richard L. Garwin, a long-standing UCS board member who helped develop the H-bomb as a young man and has since advised the government on all matters of security issues. Last year, he was awarded a Presidential Medal of Freedom.

What’s the Deal?

If President Trump did pull out of the agreement, what would that mean? First, the Joint Comprehensive Plan of Action (JCPoA) (as it is formally named) is not an agreement between just Iran and the US—but also includes China, France, Germany, Russia, the UK, and the European Union. So the agreement will continue—unless Iran responds by quitting as well. (More on that later.)

The Iran Deal is not a treaty, and did not require Senate ratification. Instead, the United States participates in the JCPoA by presidential action. However, Congress wanted to get into the act and passed The Iran Agreement Review Act of 2015, which requires the president to certify every 90 days that Iran remains in compliance.

President Trump has done so twice, but declined to do so this month and instead called for Congress—and US allies—to work with the administration “to address the deal’s many serious flaws.” Among those supposed flaws is that the deal covering Iran’s nuclear activities does not also cover its missile activities!

According to President Trump’s October 13 remarks:

Key House and Senate leaders are drafting legislation that would amend the Iran Nuclear Agreement Review Act to strengthen enforcement, prevent Iran from developing an inter– —this is so totally important—an intercontinental ballistic missile, and make all restrictions on Iran’s nuclear activity permanent under US law.

The Reality

First, according to the International Atomic Energy Agency, which verifies the agreement, Iran remains in compliance. This was echoed by Norman Roule, who retired this month after working at the CIA for three decades. He served as the point person for US intelligence on Iran under multiple administrations. He told an NPR interviewer, “I believe we can have confidence in the International Atomic Energy Agency’s efforts.”

Second, the Iran Deal was the product of several years of negotiations. Not surprisingly, recent statements by the United Kingdom, France, Germany, the European Union, and Iran make clear that they will not agree to renegotiate the agreement. It just won’t happen. US allies are highly supportive of the Iran Deal.

Third, Congress can change US law by amending the Iran Nuclear Agreement Review Act, but this will have no effect on the terms of the Iran Deal. This may be a face-saving way for President Trump to stay with the agreement—for now. However, such amendments will lay the groundwork for a future withdrawal and give credence to President Trump’s claims that the agreement is a “bad deal.” That’s why the scientists urged Congress to support the Iran Deal as it is.

The End of a Good Deal?

If President Trump pulls out of the Iran Deal and reimposes sanctions against Iran, our allies will urge Iran to stay with the deal. But Iran has its own hardliners who want to leave the deal—and a US withdrawal is exactly what they are hoping for.

If Iran leaves the agreement, President Trump will have a lot to answer for. Here is an agreement that significantly extends the time it would take for Iran to produce enough material for a nuclear weapon, and that would give the world an alarm if they started to do so. For the United States to throw that out the window would be deeply irresponsible. It would not just undermine its own security, but that of Iran’s neighbors and the rest of the world.

Congress should do all it can to prevent this outcome. The scientists sent their letter to Senators Corker and Cardin, who are the Chairman and Ranking Member of the Senate Foreign Relations Committee, and to Representatives Royce and Engel, who are the Chairman and Ranking Member of the House Foreign Affairs Committee, because these men have a special responsibility on issues like these.

Let’s hope these four men will do what’s needed to prevent the end of a good deal—a very good deal.

55 Years After Preventing Nuclear Attack, Arkhipov Honored With Inaugural Future of Life Award

London, UK – On October 27, 1962, a soft-spoken naval officer named Vasili Arkhipov single-handedly prevented nuclear war during the height of the Cuban Missile Crisis. Arkhipov’s submarine captain, thinking their sub was under attack by American forces, wanted to launch a nuclear weapon at the ships above. Arkhipov, with the power of veto, said no, thus averting nuclear war.

Now, 55 years after his courageous actions, the Future of Life Institute has presented the Arkhipov family with the inaugural Future of Life Award to honor humanity’s late hero.

Arkhipov’s surviving family members, represented by his daughter Elena and grandson Sergei, flew into London for the ceremony, which was held at the Institute of Engineering & Technology. After explaining Arkhipov’s heroics to the audience, Max Tegmark, president of FLI, presented the Arkhipov family with their award and $50,000. Elena and Sergei were both honored by the gesture and by the overall message of the award.

Elena explained that her father “always thought that he did what he had to do and never consider his actions as heroism. … Our family is grateful for the prize and considers it as a recognition of his work and heroism. He did his part for the future so that everyone can live on our planet.”

Elena and Sergei with the Future of Life Award

The Future of Life Award seeks to recognize and reward those who take exceptional measures to safeguard the collective future of humanity. Arkhipov, whose courage and composure potentially saved billions of lives, was an obvious choice for the inaugural event.

“Vasili Arkhipov is arguably the most important person in modern history, thanks to whom October 27 2017 isn’t the 55th anniversary of World War III,” FLI president Max Tegmark explained. “We’re showing our gratitude in a way he’d have appreciated, by supporting his loved ones.”

The award also aims to foster a dialogue about the growing existential risks that humanity faces, and the people that work to mitigate them.

Jaan Tallinn, co-founder of FLI, said: “Given that this century will likely bring technologies that can be even more dangerous than nukes, we will badly need more people like Arkhipov — people who will represent humanity’s interests even in the heated moments of a crisis.”

FLI president Max Tegmark presenting the Future of Life Award to Arkhipov’s daughter, Elena, and grandson, Sergei.

 

Arkhipov’s Story

On October 27 1962, during the Cuban Missile Crisis, eleven US Navy destroyers and the aircraft carrier USS Randolph had cornered the Soviet submarine B-59 near Cuba, in international waters outside the US “quarantine” area. Arkhipov was one of the officers on board. The crew had had no contact with Moscow for days and didn’t know whether World War III had already begun. Then the Americans started dropping small depth charges at them which, unbeknownst to the crew, they’d informed Moscow were merely meant to force the sub to surface and leave.

“We thought – that’s it – the end”, crewmember V.P. Orlov recalled. “It felt like you were sitting in a metal barrel, which somebody is constantly blasting with a sledgehammer.”

What the Americans didn’t know was that the B-59 crew had a nuclear torpedo that they were authorized to launch without clearing it with Moscow. As the depth charges intensified and temperatures onboard climbed above 45ºC (113ºF), many crew members fainted from carbon dioxide poisoning, and in the midst of this panic, Captain Savitsky decided to launch their nuclear weapon.

“Maybe the war has already started up there,” he shouted. “We’re gonna blast them now! We will die, but we will sink them all – we will not disgrace our Navy!”

The combination of depth charges, extreme heat, stress, and isolation from the outside world almost lit the fuse of full-scale nuclear war. But it didn’t. The decision to launch a nuclear weapon had to be authorized by three officers on board, and one of them, Vasili Arkhipov, said no.

Amidst the panic, the 34-year old Arkhipov remained calm and tried to talk Captain Savitsky down. He eventually convinced Savitsky that these depth charges were signals for the Soviet submarine to surface, and the sub surfaced safely and headed north, back to the Soviet Union.

It is sobering that very few have heard of Arkhipov, although his decision was perhaps the most valuable individual contribution to human survival in modern history. PBS made a documentary, The Man Who Saved the World, documenting Arkhipov’s moving heroism, and National Geographic profiled him as well in an article titled – You (and almost everyone you know) Owe Your Life to This Man.

The Cold War never became a hot war, in large part thanks to Arkhipov, but the threat of nuclear war remains high. Beatrice Fihn, Executive Director of the International Campaign to Abolish Nuclear Weapons (ICAN) and this year’s recipient of the Nobel Peace Prize, hopes that the Future of Life Award will help draw attention to the current threat of nuclear weapons and encourage more people to stand up to that threat. Fihn explains: “Arkhipov’s story shows how close to nuclear catastrophe we have been in the past. And as the risk of nuclear war is on the rise right now, all states must urgently join the Treaty on the Prohibition of Nuclear Weapons to prevent such catastrophe.”

Of her father’s role in preventing nuclear catastrophe, Elena explained: “We must strive so that the powerful people around the world learn from Vasili’s example. Everybody with power and influence should act within their competence for world peace.”

Understanding Artificial General Intelligence — An Interview With Hiroshi Yamakawa

Click here to see this page in other languages : Japanese  

Artificial general intelligence (AGI) is something of a holy grail for many artificial intelligence researchers. Today’s narrow AI systems are only capable of specific tasks — such as internet searches, driving a car, or playing a video game — but none of the systems today can do all of these tasks. A single AGI would be able to accomplish a breadth and variety of cognitive tasks similar to that of people.

How close are we to developing AGI? How can we ensure that the power of AGI will benefit the world, and not just the group who develops it first? Will AGI become an existential threat for humanity, or an existential hope?

Dr. Hiroshi Yamakawa, Director of Dwango AI Laboratory, is one of the leading AGI researchers in Japan. Members of the Future of Life Institute sat down with Dr. Yamakawa and spoke with him about AGI and his lab’s progress in developing it. In this interview, Dr. Yamakawa explains how AI can model the human brain, his vision of a future where humans coexist with AGI, and why the Japanese think of AI differently than many in the West.

This transcript has been heavily edited for brevity. You can see the full conversation here.

Why did the Dwango Artificial Intelligence Laboratory make a large investment in [AGI]?

HY: Usable AI that has been developed up to now is essentially for solving specific areas or addressing a particular problem. Rather than just solving a number of problems using experience, AGI, we believe, will be more similar to human intelligence that can solve various problems which were not assumed in the design phase.

What is the advantage of the Whole Brain Architecture approach?

HY: The whole brain architecture is an engineering-based research approach “to create a human-like artificial general intelligence (AGI) by learning from the architecture of the entire brain.” Basically, this approach to building AGI is the integration of artificial neural networks and machine-learning modules while using the brain’s hard wiring as a reference.

I think it will be easier to create an AI with the same behavior and sense of values as humans this way. Even if superintelligence exceeds human intelligence in the near future, it will be comparatively easy to communicate with AI designed to think like a human, and this will be useful as machines and humans continue to live and interact with each other.

General intelligence is a function of many combined, interconnected features produced by learning, so we cannot manually break down these features into individual parts. Because of this difficulty, one meaningful characteristic of whole brain architecture is that though based on brain architecture, it is designed to be a functional assembly of parts that can still be broken down and used.

The functional parts of the brain are to some degree already present in artificial neural networks. It follows that we can build a roadmap of AGI based on these technologies as pieces and parts.

It is now said that convolutional neural networks have essentially outperformed the system/interaction between the temporal lobe and visual cortex in terms of image recognition tasks. At the same time, deep learning has been used to achieve very accurate voice recognition. In humans, the neocortex contains about 14 billion neurons, but about half of those can be partially explained with deep learning. From this point on, we need to come closer to simulating the functions of different structures of the brain, and even without the whole brain architecture, we need to be able to assemble several structures together to reproduce some behavioral level functions. Then, I believe, we’ll have a path to expand that development process to cover the rest of the brain functions, and finally integrate as whole brain..

You also started a non-profit, the Whole Brain Architecture Initiative. How does the non-profit’s role differ from the commercial work?

HY: The Whole Brain Architecture Initiative serves as an organization that helps promote whole brain AI architecture R&D as a whole.

The Basic Ideas of the WBAI:

  • Our vision is to create a world in which AI exists in harmony with humanity.
  • Our mission is to promote the open development of whole brain architecture.
    • In order to make human-friendly artificial general intelligence a public good for all of mankind, we seek to continually expand open, collaborative efforts to develop AI based on an architecture modeled after the brain.
  • Our values are Study, Imagine and Build.
    • Study: Deepen and spread our expertise.
    • Imagine: Broaden our views through public dialogue.
    • Build: Create AGI through open collaboration.

What do you think poses the greatest existential risk to global society in the 21st century?

HY: The risk is not just limited to AI; basically, as human scientific and technological abilities expand, and we become more empowered, risks will increase, too.

Imagine a large field where everyone only has weapons as dangerous as bamboo spears.  The risk that human beings would go extinct by killing each other is extremely small.  On the other hand, as technologies develop, we have bombs in a very small room and no matter who detonates the bomb, we approach a state of annihilation. That risk should concern everyone.

If there are only 10 people in the room, they will mutually monitor and trust each other. However, imagine trusting 10 billion people each with the ability to destroy everyone — such a scenario is beyond our ability to comprehend. Of course, technological development will advance not only offensive power but also defensive power, but it is not easy to have defensive power to contain attacking power at the same time. If scientific and technological development are promoted using artificial intelligence technology, for example, many countries will easily hold intercontinental ballistic fleets, and artificial intelligence can be extremely dangerous to living organisms by using nanotechnology. It could comprise a scenario to extinguish mankind by the development or use of dangerous substances.  Generally speaking, new offensive weapons are developed utilizing the progress of technology, and defensive weapons are developed to neutralize them. Therefore, it is inevitable that periods will exist where the offensive power needed to destroy humanity exceeds its defensive power.

What do you think is the greatest benefit that AGI can bring society?

HY: AGI’s greatest benefit comes from acceleration of development for science and technology. More sophisticated technology will offer solutions for global problems such as environmental issues, food problems and space colonization.

Here I would like to share my vision for the future: “In a desirable future, the happiness of all humans will be balanced against the survival of humankind under the support of superintelligence. In that future, society will be an ecosystem formed by augmented human beings and various public AIs, in what I dub ‘an ecosystem of shared intelligent agents’ (EcSIA).

“Although no human can completely understand EcSIA—it is too complex and vast—humans can control its basic directions. In implementing such control, the grace and wealth that EcSIA affords needs to be properly distributed to everyone.”

Assuming no global catastrophe halts progress, what are the odds of human level AGI in the next 10 years?

HY: I think there’s a possibility that it can happen soon, but taking the average of the estimates of people involved in WBAI, we came up with 2030.

In my current role as the editorial chairman for the Japanese Society of Artificial Intelligence (JSAI) journal, I’m promoting a plan to have a series of discussions starting in the July edition on the theme of “Singularity and AI,” in which we’ll have AI specialists discuss the singularity from a technical viewpoint. I want to help spread calm, technical views on the issue in this way, starting in Japan.

Once human level AGI is achieved, how long would you expect it to take for it to self-modify its way up to massive superhuman intelligence?

HY: If human-level AGI is achieved, it could take on the role of an AI researcher itself. Therefore, immediately after the AGI is built, it could start rapidly cultivating great numbers of AI researcher AI’s that work 24/7, and AI R&D would be drastically accelerated.

What probability do you assign to negative consequences as a result of badly done AI design or operation?

HY: If you include the risk of something like some company losing a lot of money, that will definitely happen.

The range of things that can be done with AI is becoming wider, and the disparity will widen between those who profit from it and those who do not. When that happens, the bad economic situation will give rise to dissatisfaction with the system, and that could create a breeding ground for war and strife. This could be perceived as the evils brought about by capitalism. It’s important that we try to curtail the causes of instability as much as possible.

Is it too soon for us to be researching AI Safety?

HY: I do not think it is at all too early to act for safety, and I think we should progress forward quickly. If possible, we should have several methods to be able to calculate the existential risk brought about by AGI.

Is there anything you think that the AI research community should be more aware of, more open about, or taking more action on?

HY: There are a number of actions that are obviously necessary. Based on this notion, we have established a number of measures like the Japanese Society for Artificial Intelligence Ethics in May 2015 (http://ai-elsi.org/ [in Japanese]), and subsequent Ethical Guidelines for AI researchers (http://ai-elsi.org/archives/514).

A majority of the content of these ethical guidelines expresses the standpoint that researchers should move forward with research that contributes to humanity and society. Additionally, one special characteristic of these guidelines is that the ninth principle listed, a call for ethical compliance of AI itself, states that AI in the future should also abide by the same ethical principles as AI researchers.

Japan, as a society, seems more welcoming of automation. Do you think the Japanese view of AI is different than that in the West?

HY: If we look at things from the standpoint of a moral society, we are all human, and without even looking from the viewpoints of one country or another, in general we should start with the mentality that we have more common characteristics than different.

When looking at AI from the traditional background of Japan, there is a strong influence from beliefs that spirits or “kami” are dwelling in all things. The boundary between living things and humans is relatively unclear, and along the same lines, the same boundaries for AI and robots are unclear. For this reason, in the past, robotic characters like “Tetsuwan Atom” (Astro Boy) and Doraemon were depicted as living and existing in the same world as humans, a theme that has been pervasive in Japanese anime for a long time.

From here on out, we will see humans and AI not as separate entities. Rather I think we will see the appearance of new combinations of AI and humans. Becoming more diverse in this way will certainly improve our chances of survival.

As a very personal view, I think that “surviving intelligence” is something that should be preserved in the future because I feel that it is very fortunate that we have established an intelligent society now, beyond the stormy sea of evolution.   Imagine a future in which our humanity is living with intelligent extraterrestrials after first contact. We will start caring about the survival of humanity but also intelligent extraterrestrials.  If that happens, one future scenario is that our dominant values will be extended to the survival of intelligence rather than the survival of the human race itself.

Hiroshi Yamakawa is the Director of Dwango AI Laboratory, Director and Chief Editor of the Japanese Society for Artificial Intelligence, a Fellow Researcher at the Brain Science Institute at Tamagawa University, and the Chairperson of the Whole Brain Architecture Initiative. He specializes in cognitive architecture, concept acquisition, neuro-computing, and opinion collection. He is one of the leading researchers working on AGI in Japan.

To learn more about Dr. Yamakawa’s work, you can read the full interview transcript here.

This interview was prepared by Eric Gastfriend, Jason Orlosky, Mamiko Matsumoto, Benjamin Peterson, Kazue Evans, and Tucker Davey. Original interview date: April 5, 2017. 

DeepMind’s AlphaGo Zero Becomes Go Champion Without Human Input

DeepMind’s AlphaGo Zero AI program just became the Go champion of the world without human data or guidance. This new system marks a significant technological jump from the AlphaGo program which beat Go champion Lee Sedol in 2016.

The game of Go has been played for more than 2,500 years and is widely viewed as not only a game, but a complex art form.  And a popular one at that. When the artificially intelligent AlphaGo from DeepMind played its first game against Sedol in March 2016, 60 million viewers tuned in to watch in China alone. AlphaGo went on to win four of five games, surprising the world and signifying a major achievement in AI research.

Unlike the chess match between Deep Blue and Garry Kasparov in 1997, AlphaGo did not win by brute force computing alone. The more complex programming of AlphaGo amazed viewers not only with the excellency of its play, but also with its creativity. The infamous “move 37” in game two was described by Go player Fan Hui as “So beautiful.” It was also so unusual that one of the commentators thought it was a mistake. Fan Hui explained, “It’s not a human move. I’ve never seen a human play this move.”

In other words, AlphaGo not only signified an iconic technological achievement, but also shook deeply held social and cultural beliefs about mastery and creativity. Yet, it turns out that AlphaGo was only the beginning. Today, DeepMind announced AlphaGo Zero.

Unlike AlphaGo, AlphaGo Zero was not shown a single human game of Go from which to learn. AlphaGo Zero learned entirely from playing against itself, with no prior knowledge of the game. Although its first games were random, the system used what DeepMind is calling a novel form of reinforcement learning to combine a neural network with a powerful search algorithm to improve each time it played.

In a DeepMind blog about the announcement, the authors write, “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.”

Though previous AIs from DeepMind have mastered Atari games without human input, as the authors of the Nature article note, “the game of Go, widely viewed as the grand challenge for artificial intelligence, [requires] a precise and sophisticated lookahead in vast search spaces.” While the old Atari games were much more straightforward, the new AI system for AlphaGo Zero had to master the strategy for immediate moves, as well as how to anticipate moves that might be played far into the future.

That this was done all without human demonstrations also takes the program a step beyond the original AlphaGo systems. But in addition to that, this new system learned with fewer input features than its predecessors, and while the original AlphaGo systems required two separate neural networks, AlphaGo Zero was built with only one.

AlphaGo Zero is not marginally better than its predecessor, but in an entirely new class of “superhuman performance” with an intelligence that is notably more general. After just three days of playing against itself (4.9 million times), AlphaGo Zero beat AlphaGo by 100 games to 0. It independently learned the ancient secrets of the masters, but also chose moves and developed strategies never before seen among human players.

Co-founder​ ​and​ ​CEO of ​DeepMind, Demis​ ​Hassabis, said: “It’s amazing to see just how far AlphaGo has come in only two years. AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data.”

Hassabis continued, “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials. If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”