Andy Zou
Why do you care about AI Existential Safety?
Artificial Intelligence (AI) now excels in a wide array of different fields — biology, visual art, writing, math, coding, and game playing. Despite the positive impacts they bring, these systems still lack important properties which raises urgent concerns in light of the heightened risks and responsibilities. Consequently, substantial effort has been focused on improving the safety of AI systems. For instance, for current AI systems one might want to make autonomous vehicles more reliable or make language generations more factually correct. However, these short-term risks are not the only ones that require attention. There is a growing consensus in the machine learning (ML) research community that AI systems may soon reach and surpass human-level intelligence across a broad range of tasks. In some cases, these AI systems may resemble autonomous agents sharing our space and resources, deployed both in the physical and digital worlds to freely interact with humans. It is absolutely paramount that as general capabilities improve quickly, safety performance follows the same trend.
Please give at least one example of your research interests related to AI existential safety:
Currently, I am creating testbeds to track agent’s deceptive and power-seeking behaviors in complex environments. This could serve as a first step towards developing agents that can reason about tradeoffs, resist temptations, and also developing methods that better guide or regulate agents. Separately, I also work on robustness, proxy gaming, and monitoring.