Thao Pham

Organisation

Berea College

Member of

Biography

Why do you care about AI Existential Safety?

There are many different kinds of risks that remain understudied in AI development. Those that concern me most are ones that could cause tremendous suffering to humanity, such as strategic misalignment that disables human agency or AI-enabled authoritarianism. Currently, there are enough warning signals but limited time for humanity to prepare for a rapidly changing future with AI. Therefore, I aim to address the risks that emerge from the unprecedented capabilities that advanced AI might possess and how to ensure such systems remain aligned with human values and under human control as they become more sophisticated.

Moreover, existential AI risks manifest differently across varying constraints of resources, institutional structures, and cultural values. With a background from the Global South, I hope to offer perspectives that address risks which can emerge in many societies with vastly different human values, ensuring that approaches to AI safety are inclusive of all human lives and responsive to diverse global contexts.

Please give at least one example of your research interests related to AI existential safety:

My primary research interest in AI existential safety centers on detecting and preventing collective misalignment in multi-agent systems. I’m particularly focused on a catastrophic capability: frontier LLMs can strategically scheme against one another in multi-agent interactions, building on the foundation of algorithmic game theory. This work directly addresses existential risks because collective misalignment, where multiple AI agents cooperate toward goals misaligned with human values, could prove catastrophically difficult to detect and control. I’m extending this research in two directions. First, I’m developing scalable detection methods for collective agentic misalignment through simulation-based monitoring and equilibrium analysis. By modeling expected cooperative behavior, I aim to identify when agent coalitions achieve payoffs inconsistent with individual optimization, a signal of coordinated deception. Second, I’m building comprehensive frameworks to evaluate game-theoretic safety risks, creating taxonomies that map mathematical game structures to real-world catastrophic scenarios. This addresses multipolar threats where multiple advanced AI systems interact without centralized control. While much work focuses on aligning individual systems, advanced AI capabilities will increasingly manifest through strategic multi-agent interactions. Preventing existential catastrophe requires formal frameworks for detecting collective misalignment, identifying unsafe equilibria, and designing mechanisms that promote robust cooperation aligned with human values, even as AI systems become capable of sophisticated strategic reasoning about each other.

Thao Pham

Sign up for the Future of Life Institute newsletter