Skip to content

AI Researcher Ramana Kumar

September 30, 2016
Revathi Kumar


AI Safety Research

Ramana Kumar

PhD student

University of Cambridge

Project: Applying Formal Verification to Reflective Reasoning

Amount Recommended:    $36,750

Project Summary

One path to significantly smarter-than-human artificial agents involves self-improvement, i.e., agents doing artificial intelligence research to make themselves even more capable. If such an agent is designed to be robust and beneficial, it should only execute self-modifying actions if it knows they are improvements, which, at a minimum, means being able to trust that the modified agent only takes safe actions. However, trusting the actions of a similar or smarter agent can lead to problems of self-reference, which can be seen as sophisticated versions of the liar paradox (which shows that the self-referential sentence “this sentence is false” cannot be consistently true or false). Several partial solutions to these problems have recently been proposed. However, current software for formal reasoning does not have sufficient support for self-referential reasoning to make these partial solutions easy to implement and study. In this project, we will implement a toy model of agents using these partial solutions to reason about self-modifications, in order to improve our understanding of the challenges of implementing self-referential reasoning, and to stimulate work on tools suitable for it.

Technical Abstract

Artificially intelligent agents designed to be highly reliable are likely to include a capacity for formal deductive reasoning to be applied in appropriate situations, such as when reasoning about computer programs including other agents and future versions of the same agent. However, it will not always be possible to model other agents precisely: considering more capable agents, only abstract reasoning about their architecture is possible. Abstract reasoning about the behavior of agents that justify their actions with proofs lead to problems of self-reference and reflection: Godel’s second incompleteness theorem shows that no sufficiently strong proof system can prove its own consistency, making it difficult for agents to show that actions their successors have proven to be safe are in fact safe (since an inconsistent proof system would be able to prove any action “safe”). Recently, some potential approaches to circumventing this obstacle have been proposed in the form of pen-and-paper proofs.

We propose building and studying implementations of agents using these approaches, to better understand the challenges of implementing tools that are able to support this type of reasoning, and to stimulate work in the interactive theorem proving community on this kind of tools.

Workshop Participation

  1. Colloquium Series on Robust and Beneficial AI (CSRBAI) – May 27-June 17, 2016.
  2. Self-Reference, Type Theory, and Formal Verification – April 1-3, 2016
    • In this conference, hosted by MIRI, participants worked on questions of self-reference in type theory and automated theorem provers, with the goal of studying systems that model themselves.

This content was first published at on September 30, 2016.

About the Future of Life Institute

The Future of Life Institute (FLI) is a global non-profit with a team of 20+ full-time staff operating across the US and Europe. FLI has been working to steer the development of transformative technologies towards benefitting life and away from extreme large-scale risks since its founding in 2014. Find out more about our mission or explore our work.

Our content

Related content

Other posts about 

If you enjoyed this content, you also might also be interested in:

AI Researcher Michael Wooldridge

AI Safety Research Michael Wooldridge Head of Department of Computer Science, Professor of Computer Science University of Oxford Senior Research […]
October 1, 2016

AI Researcher Brian Ziebart

AI Safety Research Brian Ziebart Assistant Professor Department of Computer Science University of Illinois at Chicago Project: Towards Safer […]
October 1, 2016

AI Researcher Jacob Steinhardt

AI Safety Research Jacob Steinhardt Graduate Student Stanford University Project: Summer Program in Applied Rationality and Cognition Amount Recommended:    $88,050 […]
October 1, 2016

AI Researcher Bas Steunebrink

AI Safety Research Bas Steunebrink Artificial Intelligence / Machine Learning, Postdoctoral Researcher IDSIA (Dalle Molle Institute for Artificial Intelligence) […]
October 1, 2016
Our content

Sign up for the Future of Life Institute newsletter

Join 40,000+ others receiving periodic updates on our work and cause areas.
cloudmagnifiercrossarrow-up linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram