Federico Faroldi
Why do you care about AI Existential Safety?
It falls at the intersection of “one of the biggest problems of humanity” and “things I can reasonably work on”, thus, given my intellectual interests and my position, it seems the most rational and important thing to spend my time on.
Please give at least one example of your research interests related to AI existential safety:
Two examples, one conceptual, one technical.
Conceptually, I am interested in (and working on) defining and operationalizing existential risk in such a way that it can be considered in guidelines (e.g. ISO 31000:2018) for risk management systems that will inevitably be adopted by the law (e.g. EU AI Act). Such frameworks are usually only focused on small risks, deriving from product safety regulations, and they are not conceptually fit to deal with agents and existential risk(s). This builds on years of research into jurisprudence, formal ethics, policy, normative risk.
Technically, I am working on alignment by trying to integrate reasons (as in normative, not explanatory, reasons) in reinforcement learning. It is well-known that conduct alone is not enough to ensure alignment or compliance (cf goal misgeneralization, scheming AIs). Reasons, I argue, would provide a way to make sure that agents act in a certain way because that is what they intend to do. This builds on my years of research on deontic logic, formal ethics, and formal approaches to reasons.