I have written several position papers on research agendas for AI safety, including “Concrete Problems in AI Safety”, “AI Alignment Research Overview”, and “Unsolved Problems in ML Safety”. Current projects study robustness, reward learning and reward hacking, unintended consequences of ML (especially in economic or many-to-many contexts), interpretability, forecasting, and safety from the perspective of complex systems theory.
Jacob Noah Steinhardt
Why do you care about AI Existential Safety?
In the coming decades, AI will likely have a transformative effect on society–including potentially automating and then surpassing almost all human labor. For these effects to be beneficial, we need better forecasting of AI capabilities, better tools for understanding and aligning AI systems, and a community of researchers, engineers, and policymakers prepared to implement necessary responses. I aim to help with all of these, starting from a foundation of basic research.
Please give one or more examples of research interests relevant to AI existential safety: