Samuel Albanie
Why do you care about AI Existential Safety?
As technology grows more powerful, the consequences of failure for that technology also grow. Over the last few decades, significant advances in research across both hardware and software have yielded meaningful gains in AI capabilities. In the absence of hard limits imposed by physics that preclude further gains, I believe it is prudent to consider the question of what happens if progress continues and to allocate research effort towards mitigating the risks of this eventuality.
Please give at least one example of your research interests related to AI existential safety:
- Foundation models have yielded striking gains across a broad suite of tasks spanning text, vision and code. However, the self-supervised pretraining objectives of these models often do not precisely match the desired end objective of the user. Given this disparity, I’m interested in the question of how these models can be induced to be maximally helpful for human end users. Currently, I’m considering this problem through the lens of natural language prompting – communicating tasks through utterances.
- I’m also interested in better understanding the potential for machine learning models to learn manipulation strategies (see e.g. https://arxiv.org/abs/1701.04895)