How Can AI Learn to Be Safe?

As artificial intelligence improves, machines will soon be equipped with intellectual and practical capabilities that surpass the smartest humans. But not only will machines be more capable than people, they will also be able to make themselves better. That is, these machines will understand their own design and how to improve it – or they could create entirely new machines that are even more capable.

The human creators of AIs must be able to trust these machines to remain safe and beneficial even as they self-improve and adapt to the real world.

Recursive Self-Improvement

This idea of an autonomous agent making increasingly better modifications to its own code is called recursive self-improvement. Through recursive self-improvement, a machine can adapt to new circumstances and learn how to deal with new situations.

To a certain extent, the human brain does this as well. As a person develops and repeats new habits, connections in their brains can change. The connections grow stronger and more effective over time, making the new, desired action easier to perform (e.g. changing one’s diet or learning a new language). In machines though, this ability to self-improve is much more drastic.

An AI agent can process information much faster than a human, and if it does not properly understand how its actions impact people, then its self-modifications could quickly fall out of line with human values.

For Bas Steunebrink, a researcher at the Swiss AI lab IDSIA, solving this problem is a crucial step toward achieving safe and beneficial AI.

Building AI in a Complex World

Because the world is so complex, many researchers begin AI projects by developing AI in carefully controlled environments. Then they create mathematical proofs that can assure them that the AI will achieve success in this specified space.

But Steunebrink worries that this approach puts too much responsibility on the designers and too much faith in the proof, especially when dealing with machines that can learn through recursive self-improvement. He explains, “We cannot accurately describe the environment in all its complexity; we cannot foresee what environments the agent will find itself in in the future; and an agent will not have enough resources (energy, time, inputs) to do the optimal thing.”

If the machine encounters an unforeseen circumstance, then that proof the designer relied on in the controlled environment may not apply. Says Steunebrink, “We have no assurance about the safe behavior of the [AI].”

Experience-based Artificial Intelligence

Instead, Steunebrink uses an approach called EXPAI (experience-based artificial intelligence). EXPAI are “self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated, or dismissed when falsified.”

Instead of trusting only a mathematical proof, researchers can ensure that the AI develops safe and benevolent behaviors by teaching and testing the machine in complex, unforeseen environments that challenge its function and goals.

With EXPAI, AI machines will learn from interactive experience, and therefore monitoring their growth period is crucial. As Steunebrink posits, the focus shifts from asking, “What is the behavior of an agent that is very intelligent and capable of self-modification, and how do we control it?” to asking, “How do we grow an agent from baby beginnings such that it gains both robust understanding and proper values?”

Consider how children grow and learn to navigate the world independently. If provided with a stable and healthy childhood, children learn to adopt values and understand their relation to the external world through trial and error, and by examples. Childhood is a time of growth and learning, of making mistakes, of building on success – all to help prepare the child to grow into a competent adult who can navigate unforeseen circumstances.

Steunebrink believes that researchers can ensure safe AI through a similar, gradual process of experience-based learning. In an architectural blueprint developed by Steunebrink and his colleagues, the AI is constructed “starting from only a small amount of designer-specific code – a seed.” Like a child, the beginnings of the machine will be less competent and less intelligent, but it will self-improve over time, as it learns from teachers and real-world experience.

As Steunebrink’s approach focuses on the growth period of an autonomous agent, the teachers, not the programmers, are most responsible for creating a robust and benevolent AI. Meanwhile, the developmental stage gives researchers time to observe and correct an AI’s behavior in a controlled setting where the stakes are still low.

The Future of EXPAI

Steunebrink and his colleagues are currently creating what he describes as a “pedagogy to determine what kind of things to teach to agents and in what order, how to test what the agents understand from being taught, and, depending on the results of such tests, decide whether we can proceed to the next steps of teaching or whether we should reteach the agent or go back to the drawing board.”

A major issue Steunebrink faces is that his method of experience-based learning diverges from the most popular methods for improving AI. Instead of doing the intellectual work of crafting a proof-backed optimal learning algorithm on a computer, EXPAI requires extensive in-person work with the machine to teach it like a child.

Creating safe artificial intelligence might prove to be more a process of teaching and growth rather than a function of creating the perfect mathematical proof. While such a shift in responsibility may be more time-consuming, it could also help establish a far more comprehensive understanding of an AI before it is released into the real world.

Steunebrink explains, “A lot of work remains to move beyond the agent implementation level, towards developing the teaching and testing methodologies that enable us to grow an agent’s understanding of ethical values, and to ensure that the agent is compelled to protect and adhere to them.”

The process is daunting, he admits, “but it is not as daunting as the consequences of getting AI safety wrong.”

If you would like to learn more about Bas Steunebrink’s research, you can read about his project here, or visit He is also the co-founder of NNAISENSE, which you can learn about at

This article is part of a Future of Life series on the AI safety research grants, which were funded by generous donations from Elon Musk and the Open Philanthropy Project.