Skip to content

Joan Velja

Organisation
University of Oxford
Biography

Why do you care about AI Existential Safety?

AI is taking the world by storm: fast adoption and large-scale deployment of these systems poses incrementally larger risks at each development cycle for which we may not be prepared for. We have swiftly gone from a world where AIs could not reliably play games, to one that does not rule out transformative systems by the end of the decade. For this reason, I have been pointing my time at alignment—how priors, incentives, and oversight actually bite when models meet the messy world. Small failures become system failures when billions of users, automated tooling, and economic feedback loops are in the mix. Meanwhile, we still struggle to explain why models generalize when they do, fail when they shouldn’t, or pursue goals implicit in training data rather than explicit oversight. The downsides (and upsides) are staggering, and I want to contribute to the understanding of this technology to ensure it benefits us all.

Please give at least one example of your research interests related to AI existential safety:

My research focuses on topics at the intersection of theoretical and prosaic AI alignment. My focus recently has been aimed at Scalable Oversight and how these protocols behave when applied to learning-based systems: neural networks are quirky objects, have biases and generalize unexpectedly, and I strongly believe this has to be taken into account when designing alignment methods, alongside theoretical guarantees.

In the past, I’ve worked on the challenge of partial observability in RLHF, steganographic collusion between LLMs, and AI Control-adjacent topics.

Sign up for the Future of Life Institute newsletter

Join 40,000+ others receiving periodic updates on our work and focus areas.
cloudmagnifiercrossarrow-up linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram