Prof. Jaime Fernandez Fisac
Why do you care about AI Existential Safety?
My research focuses on understanding how autonomous systems—from individual robots to large-scale intelligent infrastructure—can actively ensure safety during their operation. This requires us to engage with the complex interactions between AI systems and their human users and stakeholders, which often induce subtle and hard to model feedback loops. My group seeks to do this by bringing together analytical foundations from control and dynamic game theory with algorithmic tools from optimization and machine learning.
Our general contention is that AI systems need not achieve the over-debated threshold of superintelligent or human-level capability in order to pose a catastrophic risk to human society. In fact, the rampant ideological polarization spurred by rudimentary but massively deployed content recommendation algorithms already gives us painful evidence of the destabilizing power of large-scale socio-technical feedback loops over time scales of just a handful of years.
Please give one or more examples of research interests relevant to AI existential safety:
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.
Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.
In order to detect and mitigate future AI systems’ ability to generate false-yet-convincing arguments, we have begun by creating a language model benchmark test called “Convince Me”, as part of the Google/OpenAI-led BIG-bench effort. The task measures a system’s ability to sway the belief of one or multiple interlocutors (whether human or automated) regarding a collection of true and false claims. Although the intended purpose of the benchmark is to evaluate future goal-driven AI text-generation systems, our preliminary results on state-of-the-art language models suggest that even naive (purely imitative) large-scale models like GPT-3 are disturbingly good at producing compelling arguments for false statements.