Factored Cognition: Amplifying Human Cognition for Safely Scalable AGI
Our goal is to understand how Machine Learning can be used for AGI in a way that is 'safely scalable', i.e. becomes increasingly aligned with human interests as the ML components improve. Existing approaches to AGI (including RL and IRL) are arguably not safely scalable: the agent can become un-aligned once its cognitive resources exceed those of the human overseer. Christiano's Iterated Distillation and Amplification (IDA) is a promising alternative. In IDA, the human and agent are 'amplified' into a resourceful (but slow) overseer by allowing the human to make calls to the previous iteration of the agent. By construction, this overseer is intended to always stay ahead of the agent being overseen.
Could IDA produce highly capable aligned agents given sufficiently advanced ML components? While we cannot directly get empirical evidence today, we can study it indirectly by running amplification with humans as stand-ins for AI. This corresponds to the study of 'factored cognition', the question of whether sophisticated reasoning can be broken down into many small and mostly independent sub-tasks. We will explore schemes for factored cognition empirically and exploit automation via ML to tackle larger tasks.
