The Control Problem for Universal AI: A Formal Investigation
The agent framework, the expected utility principle, sequential decision theory, and the information-theoretic foundations of inductive reasoning and machine learning have already brought significant order into the previously heterogeneous scattered field of artificial intelligence (AI). Building on this, in the last decade I have developed the theory of Universal AI . It is the first and currently only mathematically rigorous top 'down approach to formalize artificial general intelligence. This project will drive forward the theory of Universal AI to address what might be the 21st century's most significant existential risk: solving the Control Problem, the unique principal-agent problem that arises with the creation of an artificial superintelligent agent . The goal is to extend the existing theory to enable formal investigations into the Control Problem for generally intelligent agents. Our focus is on the most essential properties that the theory of Universal AI lacks, namely a theory of agents embedded in the real world : it does not model itself reliably, it is constraint to a single agent, it does not explore safely, and it is not well-understood how to specify goals that are aligned with human values.