Why do you care about AI Existential Safety?
One of my concerns about the rapid pace of current AI development and deployment is that not all alignment issues, safety concerns, and societal harms can likely be mitigated with post-hoc solutions. Rather, I believe that many of these issues are best addressed when AI systems are being designed -- for example, through the use of risk-aware learning techniques or models that are explicitly constructed to support auditing. Thus, it is critical to begin research into such techniques and technological building blocks now, rather than react to problems as they arise; by the time many harms manifest, it will be too late to change the basic technologies they are built upon or the entrenched community practices that lead to them.
Please give at least one example of your research interests related to AI existential safety:
My work focuses on both theoretical and practical aspects of AI alignment and risk-aware learning. I'm particularly interested in reward inference from human preferences, probabilistic performance guarantees in (inverse) reinforcement learning settings, and efficient verification of agent alignment.