- Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
- Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
- Better tools for understanding neural networks.
- Better understanding of neural net scaling laws (which are an important input to AI forecasting).
![](https://futureoflife.org/wp-content/uploads/2021/09/RogerGrosse-e1632931292615.jpg)
Roger Grosse
Why do you care about AI Existential Safety?
Humanity has produced some powerful and dangerous technologies, but as of yet none that deliberately pursued long-term goals that may be at odds with our own. If we succeed in building machines smarter than ourselves — as seems likely to happen in the next few decades — our only hope for a good outcome is if we prepare well in advance.
Please give one or more examples of research interests relevant to AI existential safety: