AI Existential Safety Community

Welcome to our AI existential safety community! On this page, you’ll find a growing group of AI researchers keen to ensure that AI remains safe and beneficial even if it eventually supercedes human ability on essentially all tasks.

How to join:

Vitalik Buterin Fellowships

If you’re considering applying for the Vitalik Buterin postdoctoral fellowships or PhD fellowships, please use this page as a resource for finding a faculty mentor. All awarded fellows receive automatic community membership.

AI Professors

If you’re a professor interested in free funding for grad students or postdocs working on AI existential safety, you can apply for community membership here.

Junior AI researchers

If you’re a more junior researcher working on AI existential safety, you’re also welcome to apply for membership here, to showcase your research areas, qualify for our hassle-free “minigrants” and get invited to our workshops and networking events.

Faculty Members

Mouse-over or tap a profile to reveal more information:

University of Oxford

Prof. Alessandro Abate

New York University

Prof. Samuel Bowman

UC Berkeley

Prof. Anca Dragan

Princeton University

Prof. Jaime Fernandez Fisac

University of Toronto

Prof. Roger Grosse

Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

University of Cambridge

Prof. David Krueger

University of Oxford

Prof. Michael Osborne

UC Berkeley

Prof. Stuart Russell

Cornell University

Prof. Bart Selman

UC Berkeley

Prof. Jacob Noah Steinhardt

Massachusetts Institute of Technology

Prof. Max Tegmark

University of Chicago

Prof. Victor Veitch

University of Oxford

Prof. Alessandro Abate

Why do you care about AI Existential Safety?

I am interested in the technological repercussion of automatisation in safety-critical contexts.

Please give one or more examples of research interests relevant to AI existential safety:

I am broadly interested in safe learning, particularly in the context of RL. I blend in techniques from Formal Methods and Control Theory.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
New York University

Prof. Samuel Bowman

Why do you care about AI Existential Safety?

I find it likely that state-of-the-art machine learning systems will continue to be deployed in increasingly high-stakes settings as their capabilities continue to improve, and that this trend will continue even if these systems are not conclusively shown to be robust, leading to potentially catastrophic accidents. I also find it plausible that more powerful future systems could share building blocks in common with current technology, making it especially worthwhile to identify potentially dangerous or surprising failure modes in current technology and to develop scalable ways of mitigating these issues.

Please give one or more examples of research interests relevant to AI existential safety:

My group generally works with neural network models for language (and potentially similar multimodal models), with a focus on benchmarking, data collection, human feedback, and empirical analysis, rather than model design, theory, or systems research. Within these constraints, I’m broadly interested in work that helps to document and mitigate potential negative impacts from these systems, especially impacts that we expect may become more serious as models become more capable. I’m also open to co-advising students who are interested in these risks but are looking to pursue a wider range of methods.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Anca Dragan

Why do you care about AI Existential Safety?

I am highly skeptical that we can extrapolate current progress in AI to “general AI” anytime soon. However, I believe that the current AI paradigms fail to enable us to design AI agents in ways that avoid negative side effects — which can be catastrophic for society even when using capable yet narrow AI tools. I believe we need to think about the design of AI systems differently, and empower designers to anticipate and avoid undesired outcomes.

Please give one or more examples of research interests relevant to AI existential safety:

My research interests in AI safety include value alignment or preference learning, including accounting for human biases and suboptimality; assistive agents that empower people without having to infer their intentions; and robustness of learned rewards and of predictive human policies.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Princeton University

Prof. Jaime Fernandez Fisac

Why do you care about AI Existential Safety?

My research focuses on understanding how autonomous systems—from individual robots to large-scale intelligent infrastructure—can actively ensure safety during their operation. This requires us to engage with the complex interactions between AI systems and their human users and stakeholders, which often induce subtle and hard to model feedback loops. My group seeks to do this by bringing together analytical foundations from control and dynamic game theory with algorithmic tools from optimization and machine learning.

Our general contention is that AI systems need not achieve the over-debated threshold of superintelligent or human-level capability in order to pose a catastrophic risk to human society. In fact, the rampant ideological polarization spurred by rudimentary but massively deployed content recommendation algorithms already gives us painful evidence of the destabilizing power of large-scale socio-technical feedback loops over time scales of just a handful of years.

Please give one or more examples of research interests relevant to AI existential safety:
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.

Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.

In order to detect and mitigate future AI systems’ ability to generate false-yet-convincing arguments, we have begun by creating a language model benchmark test called “Convince Me”, as part of the Google/OpenAI-led BIG-bench effort. The task measures a system’s ability to sway the belief of one or multiple interlocutors (whether human or automated) regarding a collection of true and false claims. Although the intended purpose of the benchmark is to evaluate future goal-driven AI text-generation systems, our preliminary results on state-of-the-art language models suggest that even naive (purely imitative) large-scale models like GPT-3 are disturbingly good at producing compelling arguments for false statements.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Toronto

Prof. Roger Grosse

Why do you care about AI Existential Safety?

Humanity has produced some powerful and dangerous technologies, but as of yet none that deliberately pursued long-term goals that may be at odds with our own. If we succeed in building machines smarter than ourselves — as seems likely to happen in the next few decades — our only hope for a good outcome is if we prepare well in advance.

Please give one or more examples of research interests relevant to AI existential safety:

So far, my research has primarily focused on understanding and improving neural networks, and my research style can be thought of as theory-driven empiricism. I’m intending to focus on safety as much as I can while maintaining the quality of the research. Here are some of my group’s current and planned AI safety research directions, which build on our expertise in deep learning:

  • Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
  • Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
  • Better tools for understanding neural networks.
  • Better understanding of neural net scaling laws (which are an important input to AI forecasting).
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

Why do you care about AI Existential Safety?

With AI systems, you often get what you can measure. This creates a structural bias towards simpler measures of value and runs the risk of diverting more and more resources towards these simple goals. My interest in existential safety comes from a desire to make sure that technology supports and nurtures a rich and diverse set of values.

Please give one or more examples of research interests relevant to AI existential safety:

I work on the theoretical and practical study of machine alignment. This includes: methods for value learning from observations; algorithms to optimize uncertain objectives; formal analysis of design/oversight strategies for AI systems; as well as the study of incomplete goal specifications and corresponding consequences of overoptimization.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Cambridge

Prof. David Krueger

Why do you care about AI Existential Safety?

I got into AI because I was worried about the societal impacts of advanced AI systems, and x-risk in particular. We are not prepared – as a field, society, or species – for AGI, prepotent AI, or many other possible forms of transformative AI. This is an unprecedented global coordination challenge. Technical research may play an important role, but is unlikely to play a decisive one.  I consider addressing this problem an ethical priority.

Please give one or more examples of research interests relevant to AI existential safety:

The primary goal of my work is to increase AI existential safety. My main areas of expertise are Deep Learning and AI Alignment. I am also interested in governance and technical areas relevant for global coordination, such as mechanism design.

I am interested in any areas relevant to AI x-safety. My main interests at the moment are in:

  1. New questions and possibilities presented by large “foundation models” and other putative “proto-AGI” systems. For instance, Machine Learning-based Alignment researchers have emphasized our ability to inspect and train models. But foundation models roughly match the “classic” threat model of “there is a misaligned black-box AI agent that we need to somehow do something aligned with”. An important difference is that these models do not appear to be “agentic” and are trained offline.  Will foundation models exhibit emergent forms of agency, e.g. due to mesa-optimization?  Will models trained offline understand the world properly, or will they suffer from spurious dependencies and causal confusion?  How can we safely leverage the capabilities of misaligned foundation models?
  2. Understanding Deep Learning, especially learning and generalization, especially systematic and out-of-distribution generalization, and especially invariant prediction.  (How) can we get Deep Learning systems to understand and view the world the same way humans do?  How can we get them to generalize in the ways we intend?
  3. Preference Learning, especially Reward Modelling.  I think reward modelling is among the most promising approaches to alignment in the short term, although it would likely require good generalization (2), and still involves using Reinforcement Learning, with the attendant concerns about instrumental goals (4).
  4. Controlling instrumental goals, e.g. to manipulate users of content recommendation systems, e.g. by studying and managing incentives of AI systems.  Can we find ways for AI systems to do long-term planning that don’t engender dangerous instrumental goals?

Some quick thoughts on governance and global coordination: 

  1. A key challenge seems to be: clearly defining which categories of systems should be subject to which kind of oversight, standards, or regulation.  For instance, “automated decision-making (ADM)” seems like a crisper concept than “Artificial Intelligence” at the moment, but neither category is fine-grained enough. 
  2. I think we will need substantial involvement from AI experts in governance, and expect most good work to be highly interdisciplinary.  I would like to help promote such research.
  3. I ​am optimistic about the vision of RadicalXChange as a direction for solving coordination problems in the longer run.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Oxford

Prof. Michael Osborne

Why do you care about AI Existential Safety?

I believe that AI presents a real existential threat, and one to which I, as an AI researcher, have a duty to address. Nor is the threat from AI limited to a distant future. As AI algorithms are deployed more widely, within ever more sensitive applications, from healthcare to defence, the need for AI systems to be safer is with us today. In answer to these challenges, I believe that my particular interests – Bayesian models and numeric algorithms – offer a framework for AI that is transparent, performant and safe.

Please give one or more examples of research interests relevant to AI existential safety:

In control engineering for safety-critical areas like aerospace and automotive domains, it has long been a requirement that computer code is verifiably safe: the designers must guarantee that the code will never reach a state in which it might take a catastrophic decision. AI methods, however, are vastly more complex and adaptive than classic control algorithms, meaning that similar guarantees have not yet been achieved. As AI systems begin to have increasing influence on our lives, they must become better monitored and controlled.
I am interested in new, verifiably safe, algorithms for the most elementary computational steps that make up AI systems: numerical methods. Numerical methods, particularly optimisation methods, are well-known to be critical to both the performance and reliability of AI systems. State-of-the-art numerical methods aim to create minimal computational error through conservative assumptions. Unfortunately, in practice, these assumptions are often invalid, leading to unexpectedly high error.
Instead, I aim to develop novel numerical algorithms that explicitly estimate their own error, incorporating all possible error sources, as well as adaptively assigning computation so as to reduce overall risk. Probabilistic numerics is a new, rigorous, framework for the quantification of computational error in numerical tasks. Probabilistic Numerics was born of recent developments in the interpretation of numerical methods, providing new tools for ensuring AI safety. Numerical algorithms estimate latent (non-analytic) quantities from the result of tractable (“observable”) computations. Their task can thus be described as inference in the statistical sense, and numerical algorithms cast as learning machines that actively collect (compute) data to infer a non-analytic quantity. Importantly, this notion applies even if the quantity in question is entirely of a deterministic nature—uncertainty can be assigned to quantities that are not stochastic, just unknown. Probabilistic Numerics is the treatment of numerical computation as inference, yielding algorithms that take in probability distributions over input variables, and return probability distributions over their output, such that the output distribution reflects uncertainty caused both by the uncertain inputs and the imperfect internal computation. Moreover, Probabilistic Numerics, through its estimates of how uncertain and hence how valuable is a computation, allows the allocation of computation to itself be optimised. As a result, probabilistic numeric algorithms have been shown to offer significantly lower computational costs than alternatives. Intelligent allocation of computation can also improve safety, by forcing computation to explore troublesome edge cases that might otherwise be neglected.

I aim to apply the probabilistic numeric framework to the identification and communication of computational errors within composite AI systems. Probabilistic numerical methods offer the promise of monitoring assumptions in running computations, yielding a monitoring regime that can safely interrupt algorithms overwhelmed by their task’s complexity. This approach will allow AI systems to monitor the extent to which their own internal model matches external data, and to respond appropriately cautiously.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Stuart Russell

Why do you care about AI Existential Safety?

It is increasingly important to ask, “What if we succeed?” Our intelligence gives us power over the world and over other species; we will eventually build systems with superhuman intelligence; therefore, we face the problem of retaining power, forever, over entities that are far more powerful than ourselves.

Please give one or more examples of research interests relevant to AI existential safety:

Rebuilding AI on a new and broader foundation, with the goal of creating AI systems that are provably beneficial to humans.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Cornell University

Prof. Bart Selman

Why do you care about AI Existential Safety?

AI capabilities are increasing rapidly. This opens up exciting opportunities to address many of society’s challenges. However, we also need to recognize that we cannot fully understand the future path of AI. So we need to devote research resources to guard against potential existential risks.

Please give one or more examples of research interests relevant to AI existential safety:

My most closely related research interests are in deep RL, particularly concerning concerning challenges of safety and interpretability.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Jacob Noah Steinhardt

Why do you care about AI Existential Safety?

In the coming decades, AI will likely have a transformative effect on society–including potentially automating and then surpassing almost all human labor. For these effects to be beneficial, we need better forecasting of AI capabilities, better tools for understanding and aligning AI systems, and a community of researchers, engineers, and policymakers prepared to implement necessary responses. I aim to help with all of these, starting from a foundation of basic research.

Please give one or more examples of research interests relevant to AI existential safety:

I have written several position papers on research agendas for AI safety, including “Concrete Problems in AI Safety”, “AI Alignment Research Overview”, and “Unsolved Problems in ML Safety”. Current projects study robustness, reward learning and reward hacking, unintended consequences of ML (especially in economic or many-to-many contexts), interpretability, forecasting, and safety from the perspective of complex systems theory.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Massachusetts Institute of Technology

Prof. Max Tegmark

Why do you care about AI Existential Safety?

I’m convinced that AI will become the most powerful technology in human history, and end up being either the best or worst thing ever to happen to humanity. I therefore feel highly motivated to work on research that can tip the balance toward the former outcome.

Please give one or more examples of research interests relevant to AI existential safety:

I believe that our best shot at beneficial AGI involves replacing black-box neural networks by intelligible intelligence. The only way I’ll trust a superintelligence to be beneficial is if I can prove it, since no matter how smart it is, it can’t do the impossible. My MIT research group therefore focuses on using tools from physics and information theory to transform black-box neural networks into more understandable systems. Recent applications have included auto-discovery of symbolic formulas and invariants as well as hidden symmetries and modularities.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Chicago

Prof. Victor Veitch

Why do you care about AI Existential Safety?

I’m generally concerned with doing work that has the greatest impact on human wellbeing. I think it’s plausible that we can achieve strong AI in the near-term future. This will have a major impact on the rest of human history – so, we should get it right. As a pleasant bonus, I find that working on AI Safety leads to problems that are of fundamental importance to our understanding of machine learning and AI generally.

Please give one or more examples of research interests relevant to AI existential safety:

My main current interest in this area is the application of causality to trustworthy machine learning. Informally, the causal structure of the world seems key to making sound decisions, and so causal reasoning must be a key component of any future AI system. Accordingly, determining exactly how causal understanding can be baked into systems – and in particular how this affects their trustworthiness – is key. Additionally, this research programme offers insight into near-term trustworthiness problems, which can offer concrete directions for development. For example, the tools of causal inference play a key role in understanding domain shift, the failures of machine-learning models under (apparently) benign perturbations of input date, and in explaining (and enforcing) the rationale for decisions made by machine learning systems. For a concrete example of this type of work, see here.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.

OPEN FOR APPLICATIONS

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.