Why do you care about AI Existential Safety?
AI systems are already perfectly capable of doing significant damage to society when designed or implemented badly. The potential downside looks to only get worse as their capabilities increase as the more powerful a system, the smaller the misalignment needed for the same negative consequences. I think that clearly this requires as much attention as we can give it, given the volume of capabilities research and the ease at which systems can be deployed without oversight.
Please give at least one example of your research interests related to AI existential safety:
I work mostly on methods for inverse reinforcement learning and interpretable imitation learning, with an aim to learn about and audit the decision making systems in agents. Being able to understand the goals and incentives of systems should allow us to more effectively intervene if it looks like their behaviour may lead to significant negative outcomes. As AI systems become more capable, it’s very important for us to be able to monitor them and ensure that their aims do not diverge from ours.