Matthew Farrugia-Roberts
Why do you care about AI Existential Safety?
Human intelligence, including its limitations, is foundational to our society. Advanced artificially intelligent systems therefore stand to have foundational impacts on our society, with potential outcomes ranging from enormous benefits to outright extinction, depending on how the systems are designed. I believe that safely navigating this transition is a grand and pressing techno-social challenge facing society. What better motivation could there be?
Please give at least one example of your research interests related to AI existential safety:
The focus of my research is on understanding emergent goal-directedness in learned AI systems. Goal-directed advanced AI systems present risks when their goals are incompatible with our own, causing them to act adversarially towards us. This risk is exacerbated when goal-directedness emerges unexpectedly through a learning process, or if the goals that emerge are different from the goals we would have chosen for a system.
I am pursuing theoretical and empirical research to demonstrate, understand, and control the emergence of goals in deep reinforcement learning systems. This includes identifying robust behavioral definitions of goal-directed behavior, uncovering internal mechanisms implementing goal-directedness within learned systems, and studying the features of the learning pipeline (system architecture, data/environment, learning algorithm) that influence the formation of these structures and behaviors.