Pavan Kalyan Tankala

Organisation

Microsoft Research

Member of

Biography

Why do you care about AI Existential Safety?

AI existential safety is at the core of what I care about as a researcher. Having built and studied language models at Microsoft Research and IIT Bombay, I’ve seen firsthand how fast these systems are improving – and how much harder it’s getting to anticipate their behavior. The fact that misaligned AI could one day make irreversible decisions isn’t abstract to me; it’s a real, urgent concern that shapes the questions I ask. I’ve been especially focused on safe pretraining strategies, evaluation that actually reflects real-world stakes, and curriculum design that helps models learn safely over time.Through my startup work, I’ve also seen how even well-meaning AI systems can have unintended social impact. It made me realize that technical progress alone isn’t enough – we need methods, norms, and communities that prioritize safety, transparency, and accountability. I’m looking to be part of a group that takes these questions seriously, not just in theory, but in how we work day-to-day. That kind of environment would help me go further, ask better questions, and build towards AI that helps humanity, not harms it.

Please give at least one example of your research interests related to AI existential safety:

One of my core research interests at the intersection of AI existential safety and language modeling is developing proactive strategies that make foundational models inherently resistant to dangerous misalignment and fine-tuning attacks. My recent work, currently under review at NeurIPS, focuses on engineering the loss landscapes of large language models during pretraining so that they resist adversarial modification and harmful objective shifts after deployment. We’ve found that even models exhibiting desirable behavior post-pretraining can be “overwritten” with small amounts of malicious fine-tuning data – undermining many current safety strategies that act only post-hoc or during inference.To address this, I’m investigating a set of techniques aimed at building structural resistance into the model itself:Loss Landscape Engineering: By shaping the model’s loss surface during pretraining, we can make certain malicious gradients harder to follow. This increases the difficulty and computational cost of inducing unsafe behavior through adversarial fine-tuning.Self-Destructing Models: We are prototyping mechanisms for models to recognize adversarial or suspicious fine-tuning patterns and degrade or self-limit their capabilities in response. This can reduce the risk of covert misuse once the model is deployed or publicly available.Constrained Optimization for Tamper Resistance: I’m working on multi-objective training that balances task performance with robustness constraints – using custom regularization and learning-to-learn strategies to “bake in” alignment through both the data and optimization trajectory.This work is driven by the recognition that once a model is released, the cost of repurposing it for harmful use may be lower than expected. Post-training guardrails, no matter how well designed, are often brittle in the face of determined adversaries. My time at Microsoft Research and my own startup has made me acutely aware of how subtle vulnerabilities in real-world systems can lead to disproportionate consequences, especially when scaled.Beyond architectural defenses, I’m also exploring curriculum learning for safety – designing training data and progression strategies that encourage models to internalize values like deferring to human oversight, cautious generalization in high-stakes settings, and risk-sensitive decision-making. This includes generating safety-relevant synthetic datasets at scale and stress-testing models for robustness to out-of-distribution inputs.Together, these threads reflect my broader goal: to develop foundational techniques that make future models safer by default. Rather than relying on reactive patchwork, I aim to shift the field toward anticipatory methods that reduce the surface area for existentially risky behavior – building systems that are not just capable, but also resilient, aligned, and accountable.

Pavan Kalyan Tankala

Sign up for the Future of Life Institute newsletter