Skip to content

Adrià Garriga Alonso

Organisation
Redwood Research
Biography

Why do you care about AI Existential Safety?

It is likely that, in the next few decades, AI will surpass humans at crucial tasks like technology R&D and social persuasion. A superhuman AI will clearly understand human values, but we don’t know how to accurately point to them and tell it to optimize for that. Nor do we know how to make AI not explicitly optimize something and just remain generally helpful. Thus, shortly after it surpassing humans in capability, the destiny of the lightcone will no longer be in our hands. Everything that we value will be destroyed in pursuit of what the AI values. Which, hopefully, is neutral to us; but might include immense amounts of sentient beings suffering. I want to prevent that.

Please give at least one example of your research interests related to AI existential safety:

I am working on NN interpretability research. Through understanding NN internals, I aim to be able to tell whether the AI in wants to take over. After understanding the cognitive reasons it wants to do that, hopefully we can remove the reasons.

Sign up for the Future of Life Institute newsletter

Join 40,000+ others receiving periodic updates on our work and cause areas.
cloudmagnifiercrossarrow-up linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram