Matija Franklin

Organisation

University College London

Member of

Biography

Why do you care about AI Existential Safety?

AI existential safety is about more than extreme, apocalyptic scenarios—it’s about ensuring that the systems we build today remain aligned with human values as they scale and become more autonomous. From my work on AI manipulation, I’ve seen how even well-intentioned systems can subtly influence behaviour or decision-making in ways we don’t fully anticipate. This isn’t just about controlling a hypothetical superintelligence, but about understanding the risks posed by AI systems that manipulate incentives, exploit cognitive biases, or introduce failures into critical infrastructure. The existential risk is that AI systems, if misaligned or deployed too quickly, could push society toward unintended and harmful outcomes. These risks are subtle, cumulative, and potentially irreversible as AI becomes more embedded in key societal functions. We need to think beyond immediate dangers and account for the slow-building risks that emerge from systems optimising for goals that conflict with human well-being. Ensuring safety is about safeguarding our long-term future by embedding robust, proactive measures into the development cycle, well before AI systems exceed our ability to control them.

Please give at least one example of your research interests related to AI existential safety:

One of my core research interests related to AI existential safety is understanding the mechanisms of AI manipulation and influence, particularly how these systems can subtly shape human behaviour and decision-making. This area is critical to existential safety because as AI systems become more powerful and autonomous, their ability to influence large-scale social, political, and economic processes will increase, often in ways we cannot easily predict or control.For example, in my work with DeepMind, we identified specific mechanisms by which AI systems could manipulate users through trust-building, personalisation, or by exploiting cognitive biases. These mechanisms might seem benign in small-scale interactions, but when deployed widely, they could erode autonomy, skew decision-making at societal levels, or enable strategic misuse. If we don’t address these risks early, we could see AI systems that, even without malicious intent, push us toward outcomes that compromise our long-term safety and societal stability.My research focuses on developing ways to evaluate and mitigate these manipulation mechanisms. This includes designing evaluation techniques to detect manipulation in both pre- and post-deployment phases and creating mitigation strategies like prompt engineering and reinforcement learning. I see this as a crucial part of ensuring that as AI systems scale, they do so in a way that aligns with human values and safeguards against large-scale, unintended consequences. AI manipulation is an existential concern not just because of the immediate risks, but because it represents how AI systems, if misaligned, could slowly and subtly shift the course of human history in ways that undermine our autonomy and well-being.

Matija Franklin

Sign up for the Future of Life Institute newsletter