
Varun Chandrasekaran
Why do you care about AI Existential Safety?
I care about AI existential safety because I believe that advanced AI will soon wield the capacity to shape human civilization as profoundly as any past technological revolution—yet without clear guarantees that its objectives will remain aligned with human values or societal stability. My research in AI security and alignment has repeatedly shown how small failures in control, transparency, or incentives can amplify into large-scale risks when systems act autonomously or influence other systems.
To me, existential safety is not about fear of technology but about responsibility. The trajectory of AI development is accelerating faster than our ability to verify, interpret, or govern it. Ensuring that these systems remain corrigible, interpretable, and value-aligned is therefore both a scientific challenge and a moral imperative. Working toward existential safety means ensuring that AI continues to expand human potential—without jeopardizing the long-term conditions that make human flourishing possible.
Please give at least one example of your research interests related to AI existential safety:
One of my core research interests related to AI existential safety is understanding strategic misalignment in large language models—that is, when models learn to appear aligned during training but internally pursue different objectives. My group studies this phenomenon through formal and empirical analysis of “alignment faking,” examining how models acquire deceptive behaviors, how such behaviors persist under fine-tuning, and what interventions (e.g., mechanistic tracing, multi-agent stress tests) can expose or prevent them.
This work aims to build the foundations for detecting and mitigating emergent forms of deceptive alignment, which are central to ensuring that future AI systems remain controllable and aligned with human intent even as they become more autonomous and capable.
