Lennart Bürger
Why do you care about AI Existential Safety?
It seems likely that AGI, or even superintelligent AI, will be developed in the next few years, if we are lucky, in the next few decades. There are strong incentives to do so, and AI companies are pouring billions of dollars into it. But aligning such AGIs with our values and goals is currently an unsolved problem, and even if we succeed in aligning AI, bad actors could deliberately give an AI harmful goals. This could lead to an existential catastrophe for humanity. So I think we should devote significantly more resources and intellectual effort towards solving these challenges.
Please give at least one example of your research interests related to AI existential safety:
I am mainly interested in understanding the inner workings of AI systems so that we can detect misaligned goals or deceptive behaviour. Currently, I am working on a Lie Detector for Large Language Models, with the goal of making it reliable and generalize well to various contexts.