AI Safety Research

Bas Steunebrink

Artificial Intelligence / Machine Learning, Postdoctoral Researcher

IDSIA (Dalle Molle Institute for Artificial Intelligence)

Project: Experience-based AI (EXPAI)

Amount Recommended:    $196,650

Project Summary

As it becomes ever clearer how machines with a human level of intelligence can be built — and indeed that they will be built — there is a pressing need to discover ways to ensure that such machines will robustly remain benevolent, especially as their intellectual and practical capabilities come to surpass ours. Through self-modification, highly intelligent machines may be capable of breaking important constraints imposed initially by their human designers. The currently prevailing technique for studying the conditions for preventing this danger is based on forming mathematical proofs about the behavior of machines under various constraints. However, this technique suffers from inherent paradoxes and requires unrealistic assumptions about our world, thus not proving much at all.

Recently a class of machines that we call experience-based artificial intelligence (EXPAI) has emerged, enabling us to approach the challenge of ensuring robust benevolence from a promising new angle. This approach is based on studying how a machine’s intellectual growth can be molded over time, as the machine accumulates real-world experience, and putting the machine under pressure to test how it handles the struggle to adhere to imposed constraints.

The Swiss AI lab IDSIA will deliver a widely applicable EXPAI growth control methodology.

Technical Abstract

Whenever one wants to verify that a recursively self-improving system will robustly remain benevolent, the prevailing tendency is to look towards formal proof techniques, which however have several issues: (1) Proofs rely on idealized assumptions that inaccurately and incompletely describe the real world and the constraints we mean to impose. (2) Proof-based self-modifying systems run into logical obstacles due to Lob’s theorem, causing them to progressively lose trust in future selves or offspring. (3) Finding nontrivial candidates for provably beneficial self-modifications requires either tremendous foresight or intractable search.

Recently a class of AGI-aspiring systems that we call experience-based AI (EXPAI) has emerged, which fix/circumvent/trivialize these issue. They are self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated or dismissed when falsified. We expect EXPAI to have high impact due to its practicality and tractability. Therefore we must now study how EXPAI implementations can be molded and tested during their early growth period to ensure their robust adherence to benevolence constraints.

In this project, the Swiss AI lab IDSIA will deliver an EXPAI growth control methodology that shall be widely applicable.


How Can AI Learn to Be Safe?

As artificial intelligence improves, machines will soon be equipped with intellectual and practical capabilities that surpass the smartest humans. But not only will machines be more capable than people, they will also be able to make themselves better. That is, these machines will understand their own design and how to improve it – or they could create entirely new machines that are even more capable.

The human creators of AIs must be able to trust these machines to remain safe and beneficial even as they self-improve and adapt to the real world.

Recursive Self-Improvement

This idea of an autonomous agent making increasingly better modifications to its own code is called recursive self-improvement. Through recursive self-improvement, a machine can adapt to new circumstances and learn how to deal with new situations.

To a certain extent, the human brain does this as well. As a person develops and repeats new habits, connections in their brains can change. The connections grow stronger and more effective over time, making the new, desired action easier to perform (e.g. changing one’s diet or learning a new language). In machines though, this ability to self-improve is much more drastic.

An AI agent can process information much faster than a human, and if it does not properly understand how its actions impact people, then its self-modifications could quickly fall out of line with human values.

For Bas Steunebrink, a researcher at the Swiss AI lab IDSIA, solving this problem is a crucial step toward achieving safe and beneficial AI.

Building AI in a Complex World

Because the world is so complex, many researchers begin AI projects by developing AI in carefully controlled environments. Then they create mathematical proofs that can assure them that the AI will achieve success in this specified space.

But Steunebrink worries that this approach puts too much responsibility on the designers and too much faith in the proof, especially when dealing with machines that can learn through recursive self-improvement. He explains, “We cannot accurately describe the environment in all its complexity; we cannot foresee what environments the agent will find itself in in the future; and an agent will not have enough resources (energy, time, inputs) to do the optimal thing.”

If the machine encounters an unforeseen circumstance, then that proof the designer relied on in the controlled environment may not apply. Says Steunebrink, “We have no assurance about the safe behavior of the [AI].”

Experience-based Artificial Intelligence

Instead, Steunebrink uses an approach called EXPAI (experience-based artificial intelligence). EXPAI are “self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated, or dismissed when falsified.”

Instead of trusting only a mathematical proof, researchers can ensure that the AI develops safe and benevolent behaviors by teaching and testing the machine in complex, unforeseen environments that challenge its function and goals.

With EXPAI, AI machines will learn from interactive experience, and therefore monitoring their growth period is crucial. As Steunebrink posits, the focus shifts from asking, “What is the behavior of an agent that is very intelligent and capable of self-modification, and how do we control it?” to asking, “How do we grow an agent from baby beginnings such that it gains both robust understanding and proper values?”

Consider how children grow and learn to navigate the world independently. If provided with a stable and healthy childhood, children learn to adopt values and understand their relation to the external world through trial and error, and by examples. Childhood is a time of growth and learning, of making mistakes, of building on success – all to help prepare the child to grow into a competent adult who can navigate unforeseen circumstances.

Steunebrink believes that researchers can ensure safe AI through a similar, gradual process of experience-based learning. In an architectural blueprint developed by Steunebrink and his colleagues, the AI is constructed “starting from only a small amount of designer-specific code – a seed.” Like a child, the beginnings of the machine will be less competent and less intelligent, but it will self-improve over time, as it learns from teachers and real-world experience.

As Steunebrink’s approach focuses on the growth period of an autonomous agent, the teachers, not the programmers, are most responsible for creating a robust and benevolent AI. Meanwhile, the developmental stage gives researchers time to observe and correct an AI’s behavior in a controlled setting where the stakes are still low.

The Future of EXPAI

Steunebrink and his colleagues are currently creating what he describes as a “pedagogy to determine what kind of things to teach to agents and in what order, how to test what the agents understand from being taught, and, depending on the results of such tests, decide whether we can proceed to the next steps of teaching or whether we should reteach the agent or go back to the drawing board.”

A major issue Steunebrink faces is that his method of experience-based learning diverges from the most popular methods for improving AI. Instead of doing the intellectual work of crafting a proof-backed optimal learning algorithm on a computer, EXPAI requires extensive in-person work with the machine to teach it like a child.

Creating safe artificial intelligence might prove to be more a process of teaching and growth rather than a function of creating the perfect mathematical proof. While such a shift in responsibility may be more time-consuming, it could also help establish a far more comprehensive understanding of an AI before it is released into the real world.

Steunebrink explains, “A lot of work remains to move beyond the agent implementation level, towards developing the teaching and testing methodologies that enable us to grow an agent’s understanding of ethical values, and to ensure that the agent is compelled to protect and adhere to them.”

The process is daunting, he admits, “but it is not as daunting as the consequences of getting AI safety wrong.”

If you would like to learn more about Bas Steunebrink’s research, you can read about his project here, or visit He is also the co-founder of NNAISENSE, which you can learn about at

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.


  1. Nivel,  E., et al. Bounded Recursive Self-Improvement. Technical Report RUTR-SCS13006, Reykjavik University. 2013.
  2. Steunebrink,  B.R., et al.  Growing  Recursive  Self-Improvers. Proceedings  of  the  9th  Conference  on  Artificial  General  Intelligence  (AGI 2016), LNAI 9782, pages 129-139. Springer, Heidelberg. 2016.
  3. Thorisson, K.R., et al. Why Artificial Intelligence Needs a Task Theory (And What It Might Look Like). Proceedings of the 9th Conference on Artificial General Intelligence (AGI 2016), LNAI 9782, pages 118-128. Springer, Heidelberg. 2016.
  4. Thorisson, K.R., et al. About Understanding. Proceedings  of  the  9th  Conference  on  Artificial  General  Intelligence  (AGI  2016), LNAI 9782, pages 106-117. Springer, Heidelberg. 2016.


  1. Colloquium Series on Robust and Beneficial AI (CSRBAI): May 27-June 17. MIRI, Berkeley, CA.
  2. Specific Workshop: Robustness and Error-Tolerance. June 4-5.
    • How can humans ensure that when AI system fail, they fail gracefully and detectably? This is difficult for systems that must adapt to new or changing environments; standard PAC guarantees for machine learning systems fail to hold when the distribution of test data does not match the distribution of training data. Moreover, systems capable of means-end reasoning may have incentives to conceal failures that would result in their being shut down. These researchers would much prefer to have methods of developing and validating AI systems such that any mistakes can be quickly noticed and corrected.
  3. Specific Workshop: Preference Specification. June 11-12.
    • The perennial problem of wanting code to “do what I mean, not what I said” becomes increasingly challenging when systems may find unexpected ways to pursue a given goal. Highly capable AI systems thereby increase the difficulty of specifying safe and useful goals, or specifying safe and useful methods for learning human preferences.


  1. Gave a talk at the AAAI-16 conference in Phoenix. February 2016.
  2. Week-long working visit to Reykjavk University and the Icelandic Institute for Intelligent Machines, working with Kristinn Thorisson and colleagues. May 2016.
    • These researchers have drawn up a set of requirements–so far unpublished–that can be developed into a methodology for making an expai agent learn and understand ethical values, and integrate them into its motivations so as to be compelled to adhere to them. Steunebrink has tested key aspects of the expai-based value learning method on audiences at CSRBAI and AGI, notably the requirement that values must settle as constraints in simulation-capable agents in order to ensure that such agents will be compelled to protect their values against violation and interference.
  3. Gave a talk at the AGI-16 conference in New York. July 2016.
  4. Attended the IEEE Symposium on Ethics of Autonomous Systems (SEAS Europe). August 2016.
  5. Attended the ECAI-16 conference in The Hague. September 2016.