AI Existential Safety Community

Welcome to our AI existential safety community! On this page, you’ll find a growing group of AI researchers keen to ensure that AI remains safe and beneficial even if it eventually supercedes human ability on essentially all tasks.

How to join:

Vitalik Buterin Fellowships

If you’re considering applying for the Vitalik Buterin postdoctoral fellowships or PhD fellowships, please use this page as a resource for finding a faculty mentor. All awarded fellows receive automatic community membership.

AI Professors

If you’re a professor interested in free funding for grad students or postdocs working on AI existential safety, you can apply for community membership here.

Junior AI researchers

If you’re a more junior researcher working on AI existential safety, you’re also welcome to apply for membership here, to showcase your research areas, qualify for our hassle-free “minigrants” and get invited to our workshops and networking events.

Faculty Members

Mouse-over or tap a profile to reveal more information:

University of Oxford

Prof. Alessandro Abate

UC Berkeley

Prof. Anca Dragan

Cornell University

Prof. Bart Selman

Stanford University

Prof. Clark Barrett

University of Cambridge

Prof. David Krueger

McGill University

Prof. David Rolnick

Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

ETH Zurich

Prof. Florian Tramer

New York University

Prof. He He

UC Berkeley

Prof. Jacob Noah Steinhardt

Princeton University

Prof. Jaime Fernandez Fisac

Universitat Politecnica de Valencia

Prof. José Hernández-Orallo

Massachusetts Institute of Technology

Prof. Max Tegmark

Sonneberg Observatory

Prof. Michael Hippke

University of Oxford

Prof. Michael Osborne

Chalmers University of Technology

Prof. Olle Häggström

Federation University

Prof. Peter Vamplew

Deakin University

Prof. Richard Dazeley

University of Toronto

Prof. Roger Grosse

University of Louisville

Prof. Roman Yampolskiy

New York University

Prof. Samuel Bowman

University of Wisconsin - Madison

Prof. Sharon Li

Niagara University

Prof. Steve Petersen

UC Berkeley

Prof. Stuart Russell

University of Toronto

Prof. Tegan Maharaj

Teesside University

Prof. The Anh Han

University of Chicago

Prof. Victor Veitch

Carnegie Mellon University

Prof. Vincent Conitzer

University of Oxford

Prof. Alessandro Abate

Why do you care about AI Existential Safety?

My background in Formal Verification makes me aware of the importance of assuring safety in certain application domains: whilst this is usually done in engineering areas such as Aeronautics, Space, or Critical Infrastructures, also modern developments in AI, particularly concerning the interactions of machines and humans, bring to the fore the topic of safety assurance of AI systems, and of certification of control software for AI. This is being studied for single-agent systems (think of autonomous driving applications), but will become ever more relevant in the near future for newer, multi-agent setups, particularly involving humans. Understanding causes of risk, and potential preventative safety measures, that can be obtained for these engineering areas, can help us mitigate certain severe and unintended consequences. And whilst it is admittedly perhaps draconian to think of existential risks as we speak, engineers and computer scientist ought to be aware of potential future escalation of development and reach of AI systems.

Please give one or more examples of research interests relevant to AI existential safety:

With my research group at Oxford (OXCAV, oxcav.web,ox.ac.uk) I am engaged in a broad initiative on ‘Safe RL’, spanning issues including logically-constrained RL, inverse RL for non-Markovian and sparse tasks, multi-agent RL, and Bayesian or Bayes-adaptive (direct and inverse) RL. All these projects contribute to developing new learning architectures with certificates that in particular can encompass safety assurance. We are active in translating research into applications and in transferring research into new technological solutions in various safety-critical domains, such as Cyber-Physical Systems (CPS).

UC Berkeley

Prof. Anca Dragan

Why do you care about AI Existential Safety?

I am highly skeptical that we can extrapolate current progress in AI to “general AI” anytime soon. However, I believe that the current AI paradigms fail to enable us to design AI agents in ways that avoid negative side effects — which can be catastrophic for society even when using capable yet narrow AI tools. I believe we need to think about the design of AI systems differently, and empower designers to anticipate and avoid undesired outcomes.

Please give one or more examples of research interests relevant to AI existential safety:

My research interests in AI safety include value alignment or preference learning, including accounting for human biases and suboptimality; assistive agents that empower people without having to infer their intentions; and robustness of learned rewards and of predictive human policies.
Cornell University

Prof. Bart Selman

Why do you care about AI Existential Safety?

AI capabilities are increasing rapidly. This opens up exciting opportunities to address many of society’s challenges. However, we also need to recognize that we cannot fully understand the future path of AI. So we need to devote research resources to guard against potential existential risks.

Please give one or more examples of research interests relevant to AI existential safety:

My most closely related research interests are in deep RL, particularly concerning concerning challenges of safety and interpretability.

Stanford University

Prof. Clark Barrett

Why do you care about AI Existential Safety?

I am one of the directors of the Stanford Center for AI Safety. Existential risk is one of the aspects of AI safety we care about at the center.

Please give one or more examples of research interests relevant to AI existential safety:

As AI systems become more complicated and more sophisticated, it is important to develop techniques for understanding, decomposing, and explaining their complexity and behavior. These techniques will, I believe, be crucial also for mitigating existential risk.

University of Cambridge

Prof. David Krueger

Why do you care about AI Existential Safety?

I got into AI because I was worried about the societal impacts of advanced AI systems, and x-risk in particular. We are not prepared – as a field, society, or species – for AGI, prepotent AI, or many other possible forms of transformative AI. This is an unprecedented global coordination challenge. Technical research may play an important role, but is unlikely to play a decisive one.  I consider addressing this problem an ethical priority.

Please give one or more examples of research interests relevant to AI existential safety:

The primary goal of my work is to increase AI existential safety. My main areas of expertise are Deep Learning and AI Alignment. I am also interested in governance and technical areas relevant for global coordination, such as mechanism design.

I am interested in any areas relevant to AI x-safety. My main interests at the moment are in:

  1. New questions and possibilities presented by large “foundation models” and other putative “proto-AGI” systems. For instance, Machine Learning-based Alignment researchers have emphasized our ability to inspect and train models. But foundation models roughly match the “classic” threat model of “there is a misaligned black-box AI agent that we need to somehow do something aligned with”. An important difference is that these models do not appear to be “agentic” and are trained offline.  Will foundation models exhibit emergent forms of agency, e.g. due to mesa-optimization?  Will models trained offline understand the world properly, or will they suffer from spurious dependencies and causal confusion?  How can we safely leverage the capabilities of misaligned foundation models?
  2. Understanding Deep Learning, especially learning and generalization, especially systematic and out-of-distribution generalization, and especially invariant prediction.  (How) can we get Deep Learning systems to understand and view the world the same way humans do?  How can we get them to generalize in the ways we intend?
  3. Preference Learning, especially Reward Modelling.  I think reward modelling is among the most promising approaches to alignment in the short term, although it would likely require good generalization (2), and still involves using Reinforcement Learning, with the attendant concerns about instrumental goals (4).
  4. Controlling instrumental goals, e.g. to manipulate users of content recommendation systems, e.g. by studying and managing incentives of AI systems.  Can we find ways for AI systems to do long-term planning that don’t engender dangerous instrumental goals?

Some quick thoughts on governance and global coordination: 

  1. A key challenge seems to be: clearly defining which categories of systems should be subject to which kind of oversight, standards, or regulation.  For instance, “automated decision-making (ADM)” seems like a crisper concept than “Artificial Intelligence” at the moment, but neither category is fine-grained enough. 
  2. I think we will need substantial involvement from AI experts in governance, and expect most good work to be highly interdisciplinary.  I would like to help promote such research.
  3. I ​am optimistic about the vision of RadicalXChange as a direction for solving coordination problems in the longer run.
McGill University

Prof. David Rolnick

Why do you care about AI Existential Safety?

AI systems are increasingly used to make decisions and predictions that strongly impact individuals or society as a whole. Both the misuse of AI algorithms and their failure can thus have catastrophic consequences. Compounding this problem is the fact that many of the most widely used AI algorithms are essentially heuristics that do not come with provable guarantees or a full understanding of the principles underlying their success.

Please give one or more examples of research interests relevant to AI existential safety:

Just as airplane safety relies on a mathematical understanding of aerodynamics, so a mathematical foundation for AI systems is necessary if we are to place greater trust in them. My group’s work includes formally characterizing the functions that common AI systems are able to learn from data, by quantifying the “inductive biases” of different neural networks. Importantly, we have found that there is often a significant gap between the theoretical maximum expressivity of a neural network and its typical expressivity in practice. This work brings us closer to designing reliable algorithms from first principles based on the task at hand. These mathematical results have also allowed us to uncover a security flaw threatening deep learning algorithms. Namely, it is possible in many cases to reverse-engineer the parameters of an unknown neural network by observing its outputs, potentially opening up the network to adversarial attack or revealing information about its training data. My group also works on designing deep learning algorithms where the solution must satisfy a given set of constraints. These algorithms are important for safety-critical systems where violation of constraints may mean catastrophic failure. For example, we designed a deep learning algorithm to approximately solve AC Optimal Power Flow, a difficult nonconvex optimization problem involved in balancing the electrical grid. By enforcing hard constraints, we made sure our algorithm wouldn’t output a “solution” that would cause power to go out across the grid.

Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

Why do you care about AI Existential Safety?

With AI systems, you often get what you can measure. This creates a structural bias towards simpler measures of value and runs the risk of diverting more and more resources towards these simple goals. My interest in existential safety comes from a desire to make sure that technology supports and nurtures a rich and diverse set of values.

Please give one or more examples of research interests relevant to AI existential safety:

I work on the theoretical and practical study of machine alignment. This includes: methods for value learning from observations; algorithms to optimize uncertain objectives; formal analysis of design/oversight strategies for AI systems; as well as the study of incomplete goal specifications and corresponding consequences of overoptimization.
ETH Zurich

Prof. Florian Tramer

Why do you care about AI Existential Safety?

Recent years have shown that AI systems can exhibit a sort of threshold behavior, with a sudden emergence of new capabilities. We can thus go from systems that are nearly useless to possible wide-range deployments in a very short time frame. This makes it hard to predict what capabilities AI systems might have in the near-future. It is thus important to start thinking about existential safety now, and to canvas possible ways of identifying and mitigating such threats before they fully manifest. At the same time, research on AI existential safety can already be useful today. Current AI systems already pose several risks. While these risks are not necessarily existential in nature, they can still pose serious harms to society (e.g., automated weapons, misinformation, increased inequality, etc). By studying ways to “stifle” AI systems we might be able to address some of these contemporary issues. while simultaneously laying the groundwork for understanding and preventing existential threats that may arise in the future.

Please give one or more examples of research interests relevant to AI existential safety:

How can we adequately measure the performance and robustness of AI systems with super-human capabilities? Concretely, how would we create a “test set”, if we don’t even know how to label the data? It seems clear that AI systems that pose existential threats will have some kind of superhuman abilities (indeed, no sole human seems to have ever posed an existential threat to civilization). It thus is important to think about how we would even evaluate such systems. – How do we enable AI systems to be audited by external parties (e.g., “white hats”) for security flaws, misalignment, etc? AI systems that are likely to pose existential threats will probably be created by groups outside of public scrutiny. As we see time and time again in computer security, such public scrutiny is actually incredibly important for surfacing severe flaws and vulnerabilities. How can we design AI systems so as to facilitate such external auditability? – can we define “security boundaries” for machine learning systems, in the same way as for other computer systems? That is, can we design systems that have certain capabilities but that cannot access others? Can we design AI systems that keep other AI systems in check, or does this simply “kick the can down the road”?

New York University

Prof. He He

Why do you care about AI Existential Safety?

As AI systems become more competent and deployed into critical social sectors, it is concerning that their profound impact on society is often studied in a post-hoc way (e.g., influence on election, social polarization). While it is hard to predict the future trajectory (several more paradigm shifts might need to happen before we reach general or strong AI), I think “improving human wellbeing” should be the central objective from the very beginning when we design AI systems.

Please give one or more examples of research interests relevant to AI existential safety:

My current research interest lies in trustworthy AI, with a focus on natural language technologies. To make reliable and safe decisions, the learning system must avoid catastrophic failure when facing unfamiliar (out-of-distribution) scenarios. We aim to understand what types of distribution shifts incur risk and how to mitigate them. We are also excited by the prospect of AI systems collaborating with and learning from their human partners through natural language interaction. To this end, our work has focused on factual text generation models (that do not lie) and collaborative dialogue agents.

UC Berkeley

Prof. Jacob Noah Steinhardt

Why do you care about AI Existential Safety?

In the coming decades, AI will likely have a transformative effect on society–including potentially automating and then surpassing almost all human labor. For these effects to be beneficial, we need better forecasting of AI capabilities, better tools for understanding and aligning AI systems, and a community of researchers, engineers, and policymakers prepared to implement necessary responses. I aim to help with all of these, starting from a foundation of basic research.

Please give one or more examples of research interests relevant to AI existential safety:

I have written several position papers on research agendas for AI safety, including “Concrete Problems in AI Safety”, “AI Alignment Research Overview”, and “Unsolved Problems in ML Safety”. Current projects study robustness, reward learning and reward hacking, unintended consequences of ML (especially in economic or many-to-many contexts), interpretability, forecasting, and safety from the perspective of complex systems theory.
Princeton University

Prof. Jaime Fernandez Fisac

Why do you care about AI Existential Safety?

My research focuses on understanding how autonomous systems—from individual robots to large-scale intelligent infrastructure—can actively ensure safety during their operation. This requires us to engage with the complex interactions between AI systems and their human users and stakeholders, which often induce subtle and hard to model feedback loops. My group seeks to do this by bringing together analytical foundations from control and dynamic game theory with algorithmic tools from optimization and machine learning.

Our general contention is that AI systems need not achieve the over-debated threshold of superintelligent or human-level capability in order to pose a catastrophic risk to human society. In fact, the rampant ideological polarization spurred by rudimentary but massively deployed content recommendation algorithms already gives us painful evidence of the destabilizing power of large-scale socio-technical feedback loops over time scales of just a handful of years.

Please give one or more examples of research interests relevant to AI existential safety:
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.

Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.

In order to detect and mitigate future AI systems’ ability to generate false-yet-convincing arguments, we have begun by creating a language model benchmark test called “Convince Me”, as part of the Google/OpenAI-led BIG-bench effort. The task measures a system’s ability to sway the belief of one or multiple interlocutors (whether human or automated) regarding a collection of true and false claims. Although the intended purpose of the benchmark is to evaluate future goal-driven AI text-generation systems, our preliminary results on state-of-the-art language models suggest that even naive (purely imitative) large-scale models like GPT-3 are disturbingly good at producing compelling arguments for false statements.

Universitat Politecnica de Valencia

Prof. José Hernández-Orallo

Why do you care about AI Existential Safety?

I care about AI Existential Safety because we are starting to explore new kinds of intelligence. These new types of intelligence may be very different from us and may challenge the conception of our species and place it in a broader, Copernican context. In order to use the power that AI will bring more responsibly, we need to better understand what kinds of intelligence we can create and what their capabilities and behaviour really mean.

Please give one or more examples of research interests relevant to AI existential safety:

I do research on the evaluation of AI capabilities, as determining what AI systems can do and what they cannot do is key to understanding their possibilities and risks. I’m also interested in how humans and AI systems may interact in the future, especially as systems with higher generality become more ubiquitous, and what the future of cognition may look like.

Massachusetts Institute of Technology

Prof. Max Tegmark

Why do you care about AI Existential Safety?

I’m convinced that AI will become the most powerful technology in human history, and end up being either the best or worst thing ever to happen to humanity. I therefore feel highly motivated to work on research that can tip the balance toward the former outcome.

Please give one or more examples of research interests relevant to AI existential safety:

I believe that our best shot at beneficial AGI involves replacing black-box neural networks by intelligible intelligence. The only way I’ll trust a superintelligence to be beneficial is if I can prove it, since no matter how smart it is, it can’t do the impossible. My MIT research group therefore focuses on using tools from physics and information theory to transform black-box neural networks into more understandable systems. Recent applications have included auto-discovery of symbolic formulas and invariants as well as hidden symmetries and modularities.
Sonneberg Observatory

Prof. Michael Hippke

Why do you care about AI Existential Safety?

Artificial Intelligence may pose existential risks to humanity if it can not be confined, aligned, or its progress over time controlled. Thus, we should research these and potential other options to reduce the risk.

Please give one or more examples of research interests relevant to AI existential safety:

AI box confinement; A new AI winter due to Landauer’s limit?; AI takeoff speed measured with hardware overhang.

University of Oxford

Prof. Michael Osborne

Why do you care about AI Existential Safety?

I believe that AI presents a real existential threat, and one to which I, as an AI researcher, have a duty to address. Nor is the threat from AI limited to a distant future. As AI algorithms are deployed more widely, within ever more sensitive applications, from healthcare to defence, the need for AI systems to be safer is with us today. In answer to these challenges, I believe that my particular interests – Bayesian models and numeric algorithms – offer a framework for AI that is transparent, performant and safe.

Please give one or more examples of research interests relevant to AI existential safety:

In control engineering for safety-critical areas like aerospace and automotive domains, it has long been a requirement that computer code is verifiably safe: the designers must guarantee that the code will never reach a state in which it might take a catastrophic decision. AI methods, however, are vastly more complex and adaptive than classic control algorithms, meaning that similar guarantees have not yet been achieved. As AI systems begin to have increasing influence on our lives, they must become better monitored and controlled.
I am interested in new, verifiably safe, algorithms for the most elementary computational steps that make up AI systems: numerical methods. Numerical methods, particularly optimisation methods, are well-known to be critical to both the performance and reliability of AI systems. State-of-the-art numerical methods aim to create minimal computational error through conservative assumptions. Unfortunately, in practice, these assumptions are often invalid, leading to unexpectedly high error.
Instead, I aim to develop novel numerical algorithms that explicitly estimate their own error, incorporating all possible error sources, as well as adaptively assigning computation so as to reduce overall risk. Probabilistic numerics is a new, rigorous, framework for the quantification of computational error in numerical tasks. Probabilistic Numerics was born of recent developments in the interpretation of numerical methods, providing new tools for ensuring AI safety. Numerical algorithms estimate latent (non-analytic) quantities from the result of tractable (“observable”) computations. Their task can thus be described as inference in the statistical sense, and numerical algorithms cast as learning machines that actively collect (compute) data to infer a non-analytic quantity. Importantly, this notion applies even if the quantity in question is entirely of a deterministic nature—uncertainty can be assigned to quantities that are not stochastic, just unknown. Probabilistic Numerics is the treatment of numerical computation as inference, yielding algorithms that take in probability distributions over input variables, and return probability distributions over their output, such that the output distribution reflects uncertainty caused both by the uncertain inputs and the imperfect internal computation. Moreover, Probabilistic Numerics, through its estimates of how uncertain and hence how valuable is a computation, allows the allocation of computation to itself be optimised. As a result, probabilistic numeric algorithms have been shown to offer significantly lower computational costs than alternatives. Intelligent allocation of computation can also improve safety, by forcing computation to explore troublesome edge cases that might otherwise be neglected.

I aim to apply the probabilistic numeric framework to the identification and communication of computational errors within composite AI systems. Probabilistic numerical methods offer the promise of monitoring assumptions in running computations, yielding a monitoring regime that can safely interrupt algorithms overwhelmed by their task’s complexity. This approach will allow AI systems to monitor the extent to which their own internal model matches external data, and to respond appropriately cautiously.

Chalmers University of Technology

Prof. Olle Häggström

Why do you care about AI Existential Safety?

The world urgently needs advances in AI existential safety, as we need to have solved it by the time an AGI breakthrough happens, with timeline very much unknown. I feel that the best I can do to help ensure a blissful future for humanity (rather than premature destruction) is to try to contribute to such a solution.

Please give one or more examples of research interests relevant to AI existential safety:

Omohundro-Bostrom theory of instrumental vs final AI goals. Broader issues on emerging technologies and the long-term future of humanity, such as in my 2016 book Here Be Dragons.

Federation University

Prof. Peter Vamplew

Why do you care about AI Existential Safety?

AI has the capacity to transform our world for the better, but also the potential to cause great harm. Recent years have seen large leaps in the capabilities of AI systems (particularly those based on machine learning) and in the complexity and social impact of the problems to which they are being applied, and we are already seeing examples of the negative outcomes which can arise when the AI is biased or overlooks critical factors. As an AI researcher I believe it is vital that we focus on how to design, apply and regulate AI systems to reduce these risks and to maximise the benefits for all humanity.

Please give one or more examples of research interests relevant to AI existential safety:

My main focus is on the risks posed by AI based on unconstrained maximisation of a reward or utility measure (as in conventional reinforcement learning), and the role which multiobjective approaches to reward/utility can play in mitigating or eliminating these risks. I am also interested in reward engineering methods which will support the creation of robust reward structures which are aligned with our actual desired outcomes, and methods for automatically learning human preferences, ethics etc and incorporating these into AI agents.

Examples of my publications in this area include:

  • Vamplew, P., Dazeley, R., Foale, C., Firmin, S., & Mummery, J. (2018). Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology, 20(1), 27-40.
  • Vamplew, P., Foale, C., Dazeley, R., & Bignold, A. (2021). Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence, 100, 104186.
  • Mannion, P., Heintz, F., Karimpanal, T. G., & Vamplew, P. Multi-Objective Decision Making for Trustworthy AI, The 1st International Workshop on Multi-objective Decision Making
Deakin University

Prof. Richard Dazeley

Why do you care about AI Existential Safety?

As AI systems are increasingly integrated into everyday society, I am increasingly concerned with ensuring systems integrate safely, ethically, and that they can articulate their behavior clearly to build trust. Safety is always a trade-off between accomplishing the required task while not impacting the environment negatively and behaving in a way that is predictable to others in that environment. An autonomous car that behaves entirely safely will never go anywhere, so some risk is required. Determining and controlling the levels of acceptable risk and ensuring the system can explain its behavior against these conflicting issues is key essential to emerging Strong-AI systems.

Please give one or more examples of research interests relevant to AI existential safety:

My research over the last few years has focused on expanding our work in multiobjective Reinforcement Learning (MORL) to facilitate the automation of trade off between task accomplishment, safety, ethical behavior, and the consideration of other actors in an environment. This has seen the development of agents that can self-identify negative impacts and avoid making them in the future. This is leading into work on identifying causality in dynamic environments to ensure an agent can prevent temporally distant impacts of its behavior. We have put forward a framework for how to maintain a dialog with stakeholders to ensure an understanding of these safe and ethical trade-offs are clearly articulated and justified. We have even put forward apologetic framework that can identify a user’s safety-oriented preferences autonomously.

University of Toronto

Prof. Roger Grosse

Why do you care about AI Existential Safety?

Humanity has produced some powerful and dangerous technologies, but as of yet none that deliberately pursued long-term goals that may be at odds with our own. If we succeed in building machines smarter than ourselves — as seems likely to happen in the next few decades — our only hope for a good outcome is if we prepare well in advance.

Please give one or more examples of research interests relevant to AI existential safety:

So far, my research has primarily focused on understanding and improving neural networks, and my research style can be thought of as theory-driven empiricism. I’m intending to focus on safety as much as I can while maintaining the quality of the research. Here are some of my group’s current and planned AI safety research directions, which build on our expertise in deep learning:

  • Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
  • Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
  • Better tools for understanding neural networks.
  • Better understanding of neural net scaling laws (which are an important input to AI forecasting).
University of Louisville

Prof. Roman Yampolskiy

Why do you care about AI Existential Safety?

I care about AI existential safety for very selfish reasons, I don’t want future advanced AI to cause harm to me, my family, my friends, my community, my country, my planet, my descendants, my universe, or the multiverse. I want to avoid existential catastrophe and suffering risks for my specie and the biosphere on this planet and beyond. A superintelligence aligned with human values would be the greatest invention ever made, which would allow us to greatly improve quality of life for all people and to mitigate many other dangers both natural and man-made. I have dedicated my life to pursuing the goal of making future advanced AI globally beneficial, safe, and secure.

Please give one or more examples of research interests relevant to AI existential safety:

I am an experienced AI safety and security researcher, with over 10 years of research leadership in the domain of transformational AI. I have been a Fellow (2010) and a Research Advisor (2012) of the Machine Intelligence Research Institute (MIRI), an AI Safety Fellow (2019) of the Foresight Institute and a Research Associate (2018) of the Global Catastrophic Research Institute (GCRI). I am currently a Tenured Faculty Member in the department of Computer Science and Engineering at an R1 university in Louisville, USA and the director of our Cybersecurity laboratory. My work has been funded by NSF, NSA, DHS, EA Ventures and FLI. I have published hundreds of peer-reviewed papers, including multiple books on AI safety, such as “Artificial Superintelligence” and more recently “AI Safety and Security”.

My early work on AI Safety Engineering, AI Containment and AI Accidents has become seminal in the field and is very well-cited. I have given over a 100 public talks, served on program committees of multiple AI Safety conferences and journal editorial boards and have awards for my teaching and service to the community. I have given 100s of interviews on AI safety, including multiple appearances on the FLI podcast. My current research focus is on the theoretical limits to explainability, predictability and controllability of advanced intelligent systems. With collaborators, I continue my project related to analysis, handling and prediction/avoidance of AI accidents and failures. New projects related to monitorability, and forensic analysis of AI are currently in the pipeline. You can learn more about my ongoing and completed research from my publications: https://scholar.google.com/citations? user=0_Rq68cAAAAJ&hl=en

New York University

Prof. Samuel Bowman

Why do you care about AI Existential Safety?

I find it likely that state-of-the-art machine learning systems will continue to be deployed in increasingly high-stakes settings as their capabilities continue to improve, and that this trend will continue even if these systems are not conclusively shown to be robust, leading to potentially catastrophic accidents. I also find it plausible that more powerful future systems could share building blocks in common with current technology, making it especially worthwhile to identify potentially dangerous or surprising failure modes in current technology and to develop scalable ways of mitigating these issues.

Please give one or more examples of research interests relevant to AI existential safety:

My group generally works with neural network models for language (and potentially similar multimodal models), with a focus on benchmarking, data collection, human feedback, and empirical analysis, rather than model design, theory, or systems research. Within these constraints, I’m broadly interested in work that helps to document and mitigate potential negative impacts from these systems, especially impacts that we expect may become more serious as models become more capable. I’m also open to co-advising students who are interested in these risks but are looking to pursue a wider range of methods.
University of Wisconsin - Madison

Prof. Sharon Li

Why do you care about AI Existential Safety?

As artificial intelligence reaches society at large, the need for safe and reliable decision-making is increasingly critical. This requires intelligent systems to have an awareness of uncertainty and a mandate to confront unknown situations with caution. Yet for many decades, machine learning methods commonly have made the closed-world assumption—the test data is drawn from the same distribution as the training data (i.e., in-distribution data). Such an idealistic assumption rarely holds true in the open world, where test inputs can naturally arise from unseen categories that were not in the training data. When such a discrepancy occurs, algorithms that classify OOD samples as one of the in-distribution (ID) classes can be catastrophic. For example, a medical AI system trained on a certain set of diseases (ID) may encounter a different disease (OOD) and can cause mistreatment if not handled cautiously. Unfortunately, modern deep neural networks can produce overconfident predictions on OOD data, which raises significant reliability concerns. In my research, I deeply care about improving the safety and reliability of modern machine learning models in deployment.

Please give one or more examples of research interests relevant to AI existential safety:

My broad research interests are in deep learning and machine learning. My time in both academia and industry has shaped my view and approach in research. The goal of my research is to enable transformative algorithms and practices towards safe and reliable open-world learning, which can function safely and adaptively in the presence of evolving and unpredictable data streams. My works explore, understand, and mitigate the many challenges where failure modes can naturally occur in deploying machine learning models in the open world. Research topics that I am currently focusing on include: (1) Out-of-distribution uncertainty estimation for reliable decision-making; (2) Uncertainty-aware deep learning in healthcare and computer vision; (3) Open-world deep learning.

My research stands to benefit a wide range of societal activities and systems that range from AI services (e.g., content understanding) to transportation (e.g., autonomous vehicles), finance (e.g., risk management), and healthcare (e.g., medical diagnosis).

Niagara University

Prof. Steve Petersen

Why do you care about AI Existential Safety?

I am concerned about existential risk for the usual reasons, and my interest is specifically in the AI aspects of existential risk because it’s where I think I have the best leverage. Since I am a professional philosopher who has a decent math and computer science background, I am hoping to help widen the bottleneck between the technical AI safety community and the related issues in traditional philosophy.

Please give one or more examples of research interests relevant to AI existential safety:

  1. Philosophical aspects of value alignment, especially formal value learning
  2. The line of sentience for physical systems
UC Berkeley

Prof. Stuart Russell

Why do you care about AI Existential Safety?

It is increasingly important to ask, “What if we succeed?” Our intelligence gives us power over the world and over other species; we will eventually build systems with superhuman intelligence; therefore, we face the problem of retaining power, forever, over entities that are far more powerful than ourselves.

Please give one or more examples of research interests relevant to AI existential safety:

Rebuilding AI on a new and broader foundation, with the goal of creating AI systems that are provably beneficial to humans.

University of Toronto

Prof. Tegan Maharaj

Why do you care about AI Existential Safety?

Life on earth has evolved over such a very long time, robust to such a huge range of earth conditions, that it’s easy to feel it will always be there, that it will continue in some way or another no matter what happens. But it might not. Everything *could* go wrong.

And in fact a lot is already going wrong — humans’ actions are changing the world more rapidly than it’s ever changed, and we are decimating the diversity of earth’s ecosystem. Time to adapt and multiple redundancy have been crucial to the adaptability and robustness of life in the past. AI systems afford the possibility of changing things even more rapidly, in ways we have decreasing understanding of and control over.

It’s not pleasant to think about everything going wrong, but once one accepts that it could, it sure feels better to try to do something to help make sure it doesn’t.

Please give one or more examples of research interests relevant to AI existential safety:

We are in a critical period of the development of AI systems, where we are beginning to see important societal issues with their use, but also great promise for societal good, generating widespread will to regulate & govern AI systems responsibly. I think there’s a real possibility of doing this right if we act now, and I hope to help make that happen.

These are my short (1-5 year) research foci:

(1) Theoretical results and experiments which help better understand robustness and generalization behaviour in more realistic settings, with a focus on representation learning and out-of-distribution data. E.g. average-case generalization and sample-complexity bounds, measuring OOD robustness, time-to-failure analysis, measuring ‘representativeness’.

(2) Practical methods for safe and responsible development of AI, with a focus on alignment and dealing with distributional shift. E.g. unit tests for particular (un)desirable behaviours that could enable 3rd-party audits, sandboxes for evaluating AI systems prior to deployment and guiding design of randomized control trials, generalization suites.

(3) Popularization and specification of novel problem settings, with baseline results, for AI systems addressing important societal problems (e.g. pricing negative externalities or estimating individual-level impact of climate change, pollution, epidemic disease, or polarization in content recommendation), with a focus on common-good problems.

Teesside University

Prof. The Anh Han

Why do you care about AI Existential Safety?

AI technologies can pose significant global risks to our civilization (which can be even existential), if not safely developed and appropriately regulated. In my research group, we have developed computational models (both analytic and simulated) that capture key factors of an AI development race, revealing which strategic behaviors regarding safety compliance would likely emerge in different conditions and hypothetical scenarios of the race, and how incentives can be used to drive the race into a more positive direction. This research is part of a FLI funded AI Safety grant (https://futureoflife.org/2018-ai-grant-recipients/#Han).

For development of suitable and realistic models, it is important to capture different scenarios and contexts of AI safety development (e.g., what is the relationship between safety technologies and AI capacity and the level of risks of AI systems), so as to provide suitable regulatory actions. On the other hand, our behavioral modelling work informs e.g. what is the acceptable level of risk without leading to unncessary regulation (i.e. over-regulation).

I believe it’s important to be part of this community to learn about AI Safety research and to inform my own research agenda on AI development race/competition modelling.

Please give one or more examples of research interests relevant to AI existential safety:

My relevant research interest is to understand the dynamics of cooperation and competition of AI safety development behaviours (e.g., by companies, governments) and how incentives such as reward of safety-compliant behaviours and punishment of non-compliant ones can improve safety behaviour.

Some of my relevant publications in this direction:

1) T. A. Han, L. M. Pereira, F. C. Santos and T. Lenaerts. To Regulate or Not: A Social Dynamics Analysis of an Idealised AI Race. Vol 69, pages 881-921, Journal of Artificial Intelligence Research, 2020.
Link to publication:
https://jair.org/index.php/jair/article/view/12225

2) T. A. Han, L. M. Pereira, T. Lenaerts and F. C. Santos. Mediating artificial intelligence developments through negative and positive incentives. PloS one 16.1 (2021): e0244592.
Link to publication:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244592

3) T. A. Han, L. M. Pereira, T. Lenaerts. Modelling and Influencing the AI Bidding War: A Research Agenda. AAAI/ACM conference on AI, Ethics and Society, pages 5-11, Honolulu, Hawaii, 2019.
Link to publication:
https://dl.acm.org/doi/abs/10.1145/3306618.3314265

4) A press release article by TheConversation: https://theconversation.com/ai-developers-often-ignore-safety-in-the-pursuit-of-a-breakthrough-so-how-do-we-regulate-them-without-blocking-progress-155825?utm_source=twitter&utm_medium=bylinetwitterbutton

5) A preprint showing the impact of network structures on the AI race dynamics and safety behavioral outcome
Link: https://arxiv.org/abs/2012.15234

6) A preprint showing our analysis of a new proposal for AI regulation and governance through voluntary safety commitments
Link: https://arxiv.org/abs/2104.03741

University of Chicago

Prof. Victor Veitch

Why do you care about AI Existential Safety?

I’m generally concerned with doing work that has the greatest impact on human wellbeing. I think it’s plausible that we can achieve strong AI in the near-term future. This will have a major impact on the rest of human history – so, we should get it right. As a pleasant bonus, I find that working on AI Safety leads to problems that are of fundamental importance to our understanding of machine learning and AI generally.

Please give one or more examples of research interests relevant to AI existential safety:

My main current interest in this area is the application of causality to trustworthy machine learning. Informally, the causal structure of the world seems key to making sound decisions, and so causal reasoning must be a key component of any future AI system. Accordingly, determining exactly how causal understanding can be baked into systems – and in particular how this affects their trustworthiness – is key. Additionally, this research programme offers insight into near-term trustworthiness problems, which can offer concrete directions for development. For example, the tools of causal inference play a key role in understanding domain shift, the failures of machine-learning models under (apparently) benign perturbations of input date, and in explaining (and enforcing) the rationale for decisions made by machine learning systems. For a concrete example of this type of work, see here.

Carnegie Mellon University

Prof. Vincent Conitzer

Why do you care about AI Existential Safety?

AI systems control an ever growing part of our world. As a result, they will increasingly interact with each other directly, with little or no potential for human mediation. If each system stubbornly pursues its own objectives, this runs the risk of familiar game-theoretic tragedies – along the lines of the Tragedy of the Commons, the Prisoner’s Dilemma, or even the Traveler’s Dilemma – in which outcomes are reached that are far worse for every party than what could have been achieved cooperatively. However, AI agents can be designed in ways that make them fundamentally unlike strategic human agents. This approach is often overlooked, as we are usually inspired by our own human condition in the design of AI agents. But this approach has the potential to avoid the above tragedies in new ways. The price to pay for this, for us as researchers, is that many of our intuitions about game and decision theory, and even belief formation, start to fall short. Foundational research from the philosophy and game theory literatures provides a good starting point for pursuing this approach.

Please give one or more examples of research interests relevant to AI existential safety:

I direct the Foundations of Cooperative AI Lab (FOCAL) at Carnegie Mellon University. Our goal is to create foundations of game theory appropriate for advanced, autonomous AI agents – with a focus on achieving cooperation. Research directions include: Designing preferences, beliefs, and identities for artificial intelligence Open-source game theory and multilateral commitments Foundations of multi-agent learning Decision-theoretic foundations for game theory Self-locating beliefs.

Junior AI Researchers

Mouse-over or tap a profile to reveal more information:

University of Oxford

Lewis Hammond

Columbia University

Chad DeChant

University of Oxford

Michael Cohen

University of Oregon

Benjamin Smith

Universidad Complutense de Madrid

Pablo Antonio Moreno Casares

Oxford University

Ryan Carey

University of California, Berkeley

Sumeet Motwani

UC Berkeley

Kaylene Stocking

University of Connecticut

Aidan Kierans

University of California, Berkeley

Yaodong Yu

Stanford University

Jonathan Cefalu

Charlie Steiner

University of Alberta

Montaser Mohammedalamen

ETH Zurich

David Lindner

Oregon State University

Alex Turner

University of California, Berkeley

Scott Emmons

University of Gloucestershire

Nell Watson

New York University

Ethan Perez

University of Oxford

Eleonora Giunchiglia

Imperial College London

Francis Rhys Ward

Hong Kong University of Science and Technology

Pingchuan Ma

UNSW Canberra

Harriet Farlow

University of Bath

Aishwarya Gurung

School for Advanced Studies - University of Udine (Italy)

Michele Campolo

National Institute of Technology Karnataka

Allan Suresh

Santa Clara University

Brian Green

University of Tartu

Mykyta Baliesnyi

University of Oxford

Lewis Hammond

Why do you care about AI Existential Safety?

I care most of all about having the greatest possible positive impact on humanity (and life more generally), including its future generations. I also believe that if we succeed in developing sophisticated AI systems that are broadly more capable than humans, then this may pose an existential risk that warrants our immediate and careful attention. As a result, I work to try to ensure that AI and other powerful technologies are developed and governed safely and responsibly, both now and in the future.

Please give one or more examples of research interests relevant to AI existential safety:

My research concerns safety, control, and incentives in multi-agent systems and spans game theory, formal methods, and machine learning. Currently my efforts are focused on developing techniques to help rigorously identify or induce particular properties of multi-agent systems under their game-theoretic equilibria, especially those systems that operate in uncertain (partially known, partially observable, stochastic, etc.) environments. Examples of my recent, ongoing, and planned work include: – Reasoning about causality in games, which in turn can be used to help define agent incentives. – Automatically verifying or synthesising equilibria of multi-agent systems that induce particular properties. – Coordination and commitment mechanisms that encourage cooperation among self-interested agents. – Representing and learning preferences and constraints in both the single- and multi-agent settings. – Studying and shaping the learning dynamics of multiple agents, as modelled by evolutionary games.

Columbia University

Chad DeChant

Why do you care about AI Existential Safety?

Many technologies can pose a threat to humanity. The use of nuclear weapons and the misuse of biotechnology, for example, could pose such a threat. But these and most other dangers posed by technology are relatively well understood and their use or misuse is ultimately under the direct control of their creators. AI could pose a different kind of threat to the extent that it usurps the decision making and agency of humans. AI will also serve to magnify and accelerate more traditional threats, particularly by being used in autonomous weapons systems.

Please give one or more examples of research interests relevant to AI existential safety:

My current research is focused on enabling AI agents to accurately report, summarize, and answer questions about their past actions in natural language. The first step in controlling AI is knowing what it’s doing. And unless we’re going to follow and constantly supervise every AI agent at all times, we will have to rely on those agents themselves to tell us what they are doing. This may seem basic but I believe insufficient attention has been paid to developing this ability, which should be one of the foundations of AI safety. If we could rely on AI agents to accurately report in an understandable manner what they have done and what they plan to do, that would go a long way toward addressing many AI safety concerns.

University of Oxford

Michael Cohen

Why do you care about AI Existential Safety?

Without special design choices, advanced artificial agents planning actions over the long term in an unknown environment are likely to intervene in any physical system we set up that has the purpose of producing observations for the agent that it is programmed to treat as informative about its goal. Such tampering would likely lead to the extinction of biological life.

Please give one or more examples of research interests relevant to AI existential safety:

I am interested in how to design safe advanced artificial agents in theory and then how to construct tractable versions.

University of Oregon

Benjamin Smith

Why do you care about AI Existential Safety?

AI Alignment is an existential issue with a substantial risk of cutting off humanity’s progress in the next hundred years. Whether you’re concerned about the well-being of the next few generations, or humanity’s long-term flourishing–and I’m concerned about both–AI Existential safety is a critical issue we must get right to maximize either objective. As a postdoctoral researcher in psychology, specializing in neuroimaging and computational modeling of decision-making and motivational systems, I want to explore potential bridges I can build between neuroscience and AI Alignment. There are likely to be important lessons to learn from neuroscience and psychology for AI Alignment, and I want to understand what those are and help others understand them.

Please give one or more examples of research interests relevant to AI existential safety:

My primary overlapping interest in this area is in multi-objective utility and decision-making. I’m interested in how biological agents and artificial agents, particularly agents using reinforcement learning, can trade off multiple objectives. For artificial agents, I believe a system that can competently trade off multiple objectives is be less likely to be misaligned, or will be less misaligned, than any system that is trained on a single objective. Humans align to multiple objectives, as do other biological organism, so any human-aligned system needs to appropriately balance those objectives. Further, any measure of human preferences is an external measure that needs to operationalize preferences, and misalignment is less likely if multiple operationalizations (e.g., revealed vs. expressed preferences) are balanced against each other. Last year, I published a paper arguing that multiple objectives were necessary to model empirically-observed behavior in seemingly-monotonic objectives in rats. I am currently working on two papers directly related to AI safety. The first, with Roland Pihlakas and Robert Klassert. In contrast to previous approaches using a lexicographic ordering to trade off safety and performance objectives, we experimented with a non-linear, log-exponential trade-off that allows negative outcomes in some objectives for large positive outcomes in others, but only where positive objectives vastly outweigh negative ones. I’m also working with Peter Vamplew on a response paper “Single-objective reward is not enough”, explaining the importance of multi-objective reinforcement in biological systems.

Universidad Complutense de Madrid

Pablo Antonio Moreno Casares

Why do you care about AI Existential Safety?

I think creating or understanding AGI is one of the most important scientific endeavors of our time. It is fascinating indeed and has the potential to improve the lives of everyone a lot. On the other hand, however, we must make sure that this transition goes well, and we are capable of making advanced AI systems understand what we want. In the same way that we cannot manually specify the behavior of advanced AI systems, we should not expect to be able to write down the specific objectives they should pursue. For that reason, we need to research how to make AI systems safer.

Please give one or more examples of research interests relevant to AI existential safety:

At the moment I am quite enthusiastic about using causality to ensure AI systems have the correct instrumental incentives. The work from causalincentives.com is very relevant here. I am also interested in how to do causal representation learning of human preferences, and whether it can make systems more robust and interpretable.

Oxford University

Ryan Carey

Why do you care about AI Existential Safety?

The chances of human-level AI in the next few decades are high enough to be concerning, and it seems worthwhile to investigate how AI systems could be better aligned to human values.

Please give one or more examples of research interests relevant to AI existential safety:

I am interested in understanding the incentives of AI systems. This has included using causal models to model AI-environment interactions. Using graphs, we can study what variables an AI system might be incentivised to influence or respond to. This in turn can help us to understand whether or not an optimal system will behave safely.

University of California, Berkeley

Sumeet Motwani

Why do you care about AI Existential Safety?

AI poses a significant existential risk to humanity and a significant benefit to it too. Ensuring safety as the field of AI progresses ensures a future where we can rely on AI to solve some of the world’s most important problems while being a vital part of our daily lives.

Please give one or more examples of research interests relevant to AI existential safety:

I’m currently working on topics such as Power Seeking AI, AI Alignment, and ML for Security.

UC Berkeley

Kaylene Stocking

Why do you care about AI Existential Safety?

I believe AI technology is likely to be one of the (if not the) biggest factors driving the shape of humanity’s long-term future. I’m not sure if existential risk from AI will be a problem in my lifetime or even several lifetimes from now, but given how uncertain we are about the rate of progress towards AGI, I think it’s a good idea to think seriously about what kind of future we want and how AI will play a role in it as soon as possible. Also, I think AI (as opposed to other important existential risks) is well-aligned with my skills and interests, making it the most likely place I can have a positive impact with my research.

Please give one or more examples of research interests relevant to AI existential safety:

I am interested in how we might give AI systems the ability to reason with explicit causal hypotheses, which should make it easier for humans to audit AI-based decisions, and decrease the risk of mistakes due to problems like causal confusion or the AI system failing to take into account the impact of its own decisions on its dynamic environment.

University of Connecticut

Aidan Kierans

Why do you care about AI Existential Safety?

Artificial intelligence will be the most important technology of the 21st century. As with any powerful technology, there is a risk it could be misused or mismanaged. I developed my passion for AI research while exploring how I could best use my technical and analytical skills in the public interest. To this end, I earned concurrent bachelor’s degrees in computer science and philosophy and have devoted much of my free time to extracurricular research in both fields. With the knowledge and skills I’ve developed in both fields, I am more confident than ever that AI poses non-negligible and unacceptable existential and catastrophic risks. I aim to chip away at these risks.

Please give one or more examples of research interests relevant to AI existential safety:

I am interested in methods for measuring and producing intelligence and honesty in AI. My current research project, titled “Quantifying Misalignment Between Agents,” defines and models “misalignment” in relation to agents and their goals. Following from this, I would like to develop methods of qualitative analysis that can describe an agent’s intelligence and honesty, then follow up with quantitative benchmarks for honest AI. Incorporating the methods and knowledge I applied in my undergraduate epistemology research, I would investigate what it means for an AI system to hold beliefs and how we can ensure that those beliefs are being expressed in good faith. Answering these questions would move us closer to capable, trustworthy AI.

University of California, Berkeley

Yaodong Yu

Why do you care about AI Existential Safety?

Since AI systems are likely to outperform humans in many intellectual domains in the next few decades, I feel highly motivated to investigate how to avoid potential undesired outcomes and understand failure cases of powerful AI systems, for the purpose of ensuring safety of such AI technologies.

Please give one or more examples of research interests relevant to AI existential safety:

In order to build trustworthy machine learning systems, we need to first understand when and why machine learning models have good or bad performance. However, we still lack a fundamental understanding of the underlying principles behind the generalization, optimization, and neural network architecture design for reliable and robust machine learning. My research interests lie in understanding the theoretical foundations of “fragile” machine learning systems and developing principled approaches for robust machine learning. Current topics that I am working on include: (1). Theoretical framework for out-of-distribution generalization; (2). Min-max optimization for robust machine learning; (3). Uncertainty quantification with formal guarantees.

Stanford University

Jonathan Cefalu

Why do you care about AI Existential Safety?

Both my work and the philanthropic contributions my wife & I make are dedicated to reducing x-risk and s-risk. Our x-risk work is focused on funding AGI alignment and advocating for nuclear disarmament, and our s-risk work is focused on the reduction of wild animal suffering as well as phasing out factory farming. In order to get more funding into the field of AGI alignment, I started a company called Preamble (preamble.com) which is focused in the near term on recommender system alignment, but in the future will focus on AGI alignment as AGI becomes closer on the horizon. I am a father and I care deeply about creating a world where my young son does not have to fear AGI x-risk.

Please give one or more examples of research interests relevant to AI existential safety:

I am the lead researcher on a project hoping to prove (via simulations) that severe x-risk would inevitably arise from automating strategic nuclear warfare, with the aim of persuading all nations that any attempt to automate strategic warfare would harm their own self-interest by increasing global x-risk. In addition to this original research, I have dedicated significant time and funding into lobbying the US government to attempt to strength US policies prohibiting AI-based automation of nuclear warfare. Through this work I have built a network of contacts in the Dept. of Defense who are beginning to understand that military command automation may be self-defeating and should be significantly limited. I have helped shape US doctrine around this topic, including contributing to a public statement by the NSCAI (nscai.gov) that “The United States should make a clear, public statement that decisions to authorize nuclear weapons employment must only be made by humans, not by an AI-enabled or autonomous system, and should include such an affirmation in the DoD’s next Nuclear Posture Review.”

Charlie Steiner

Why do you care about AI Existential Safety?

It’s a rich vein of interesting philosophical and technical problems that also happens to be vital, urgently, for realizing the long-term potential of the human race.

Please give one or more examples of research interests relevant to AI existential safety:

I’m interested in how to make conceptual progress on the problem of value learning, and how to translate that progress to motivate experiments that can be carried out today using language models or model-based reinforcement learning. An example interest for conceptual progress would be how to translate values and policies between different learned ontologies.

University of Alberta

Montaser Mohammedalamen

Why do you care about AI Existential Safety?

One challenge in the field of artificial intelligence is to design agents that avoid doing harm or being destructive. Specifically, in the Reinforcement Learning (RL) field where an agent is trained by trial and error in order to achieve a goal/s represented in terms of a reward function rewarding the agent if it reaches the goal and penalizing it when it fails to do so. Deploying this agent is very challenging because if the deployed environment is not identical to the training environment the agent may not be able to achieve the desired goal, or worse result in catastrophic outcomes.

Please give one or more examples of research interests relevant to AI existential safety:

Existing approaches for safety in RL often specify safe behavior via constraints that an agent must not violate Broadly, this amounts to formulating tasks as a constrained Markov decision process (MDP). A constrained MDP can be solved using RL in a model-based or model-free way, However, this approach requires pre-defining the safe states that the agent is allowed to visit or the safe actions the agent can take. Alternatively, some approaches design “safety functions” that incentivize pre-defined safe behaviors. These approaches require an a priori description of safety information about specific scenarios and present a scaling problem as it is generally infeasible to enumerate all potentially hazardous situations in a realistic application. Our research goal is to develop agents that learn to behave cautiously in novel situations (Learning To Be Cautious). An agent that could learn to be cautious, would overcome this challenge by discovering for itself when and how to behave cautiously. Our approach characterizes reward function uncertainty without task-specific safety information (using neural network ensemble) and using this uncertainty constructs a robust policy (using robust policy optimization). Specifically, we construct robust policies with a k-of-N counterfactual regret minimization (CFR) subroutine. We validate our approach by constructing a set of toy tasks that intuitively illustrate caution in the spirit of AI Safety Gridworlds, in a sequence of increasing challenges for learning cautious behavior. Our approach exhibits caution in each of our tasks without any task-specific safety tuning. This method identifies and adopts cautious behavior in different tasks where cautious behavior is increasingly non-obvious, starting from a one-shot environment (contextual bandit) with an obvious cautious action, leading to one-shot environments with cautious actions that depend on context, and concluding with a gridworld driving environment that requires long-term planning.

ETH Zurich

David Lindner

Why do you care about AI Existential Safety?

I think there is a non-negligible chance that we will develop very capable AI systems in the next decades that could pose an existential risk, and I believe that there is research we can do today to reduce this risk significantly. Such research should have a high priority because reducing such existential risk even a bit seems to have huge expected value.

Please give one or more examples of research interests relevant to AI existential safety:

Currently I’m interested in AI alignment research, specifically in the context of reinforcement learning. Most of my work currently is focused on improving the sample efficiency of reward learning methods, that allow us to design reinforcement learning agents that learn from human feedback instead of a specified reward function. I think this research is relevant for AI existential safety, because a lot of existential risks come from the difficulty of specifying objectives for very capable systems. However, if we want to have systems learn from human preferences, it is crucial to ensure that such systems are scalable and remain competitive. Therefore, it is crucial to make learning from human preferences more sample efficient.

Oregon State University

Alex Turner

Why do you care about AI Existential Safety?

AI Existential Safety seems like a fork in the road for humanity’s future. AI is a powerful technology, and I think it will go very wrong by default. I think that we are on a “hinge of history”—that in retrospect, this century may considered be the most important century in human history. We still have time on the clock to make AGI go right. Let’s use it to the fullest.

Please give one or more examples of research interests relevant to AI existential safety:

I’m currently most interested in the statistical behavioral tendencies of different kinds of AI reasoning and training regimes. For example, when will most trained agent policies be power-seeking? What actions to expected utility maximizing agents tend to take? I have developed and published a formal theory which has begun to answer these questions.

University of California, Berkeley

Scott Emmons

Why do you care about AI Existential Safety?

COVID-19 shows how important it is to plan ahead for catastrophic risk.

Please give one or more examples of research interests relevant to AI existential safety:

I’ve done work on the game theory of value alignment and the robustness of reinforcement learning.

University of Gloucestershire

Nell Watson

Why do you care about AI Existential Safety?

AI is a powerful amplifier, one that may be applied to infinite purposes. It therefore is a steroid for existential risks, as well as a risk in itself due to misalignment and supernormal stimuli. Protecting the future against the excesses of AI is probably the biggest question of our time, and perhaps the only intellectual domain that truly matters in the long term.

Please give one or more examples of research interests relevant to AI existential safety:

I have been working in the space of AI and AI Ethics intelligence for many years, having founded a machine vision company, and having taught machine intelligence for a variety of higher-ed clients, including creating courseware for O’Reilly media on Convolutional Neural Networks, as well as courseware for IEEE and Coursera on AI Ethics. I enjoy outreach and public education in science and policy. I have given talks and lectures all on Artificial Intelligence and Ethics all over the world, on behalf of, for example, MIT, and the World Bank. I have also co-developed the Certified Ethical Emerging Technologist professional examination for CertNexus, and served as an executive consultant philosopher for Apple. I have also initiated CulturalPeace.org, working to bridge polarization in society through basic ground rules for conflict, Endohazard.org, aiming to better inform the public about which products or components contain endocrine disrupting chemicals, Pacha.org exploring how we can manage shifted costs in an intelligent and automated manner, Slana.org on leveraging entheogenic treatments within conflict zones, and a forthcoming IEEE standard of audio and visual marks denoting whether one is engaging with a human, AI, or a ‘centaur’ combination.

New York University

Ethan Perez

Why do you care about AI Existential Safety?

AI has done great good and great harm in the world already, and the potential benefits and harms will only grow as we develop more capable systems. The amount of harm or good done by AI often depends on how similar its training objective is to the objective we actually care about; the greater the misalignment, the greater the harm (including possibility of existential catastrophes). I’m interested in reducing such misalignment by developing training objectives that better capture we actually care about, even though such objectives are often hard to quantify and evaluate. In particular, the aim of my research is to train AI to tell us novel, true statements about the world rather human-like statements as current systems do. In doing so, I also hope that we learn insights about AI alignment that are useful more broadly for maximizing the good and minimize the harm from AI.

Please give one or more examples of research interests relevant to AI existential safety:

My research focuses on aligning language models with human preferences, e.g., for content that is helpful, honest, and harmless. In particular, I am excited about developing learning algorithms that outdo humans at generating such content, by producing text that is free of social biases, cognitive biases, common misconceptions, and other limitations.

University of Oxford

Eleonora Giunchiglia

Why do you care about AI Existential Safety?

AI is becoming increasingly ubiquitous, and it is likely to be applied in almost every aspect of our lives in the next few decades. However, the careless application of AI-based models in the real world can have (and, to some extent, has already had!) disastrous consequences. As AI researchers, I believe it is our responsibility to develop novel AI models that can be deemed safe and trustworthy, and hence can be applied reliably in the real-world.

Please give one or more examples of research interests relevant to AI existential safety:

My research focuses on how to create safer and more trustworthy deep learning models via the exploitation of logical constraints. The goal of my research is to develop models: ⁃ That are guaranteed by construction to always be compliant with the given set of requirements, expressed as logical constraints, and ⁃ That will have a human-like understanding of the world due to the exploitation of the background knowledge expressed by the constraints.

Imperial College London

Francis Rhys Ward

Why do you care about AI Existential Safety?

I consider myself an effective altruist and long-termist. That is, I believe that the future of humanity is incredibly valuable and that AI is (probably) the most important influence on how the long-term future goes. I also think that we an make progress on both technical and societal problems related to AI in order to reduce existential risk and more generally increase the likelihood of positive futures.

Please give one or more examples of research interests relevant to AI existential safety:

The current focus of my PhD relates to the incentives that AI agents have to manipulate humans, especially in the multi-agent reward learning setting. I recently had a paper on this topic accepted to the Coop AI workshop and as an extended abstract to AAMAS.

Hong Kong University of Science and Technology

Pingchuan Ma

Why do you care about AI Existential Safety?

Today, AI techniques become cornerstones of real-world applications even in safety-critical scenarios. Despite its success, several key challenges remain, which may deceive human perceptions and cause catastrophes in decision-making. Indeed, abuse of unreliable AI techniques could impose existential risks to humanity. Therefore, before designing more and more sophisticated AI models, it is important to ensure that AI would not threaten existential safety.

Please give one or more examples of research interests relevant to AI existential safety:

Looking at the big picture of AI Existential Safety, my research thrusts include 1) advanced data analysis solutions with better reliability and interpretability; 2) systematic benchmarking schemes for AI models; 3) applications of AI in mitigating cybersecurity threats.

UNSW Canberra

Harriet Farlow

Why do you care about AI Existential Safety?

My professional background has spanned consulting, academia, a tech start-up and Defence. All of these have been quite different in many ways, but they had one thing in common. They were all grappling with how to respond to new technologies – from the selling side, the buying side, the research side, or the implementation side. I have seen Artificial Intelligence from a lot of sides now and I’m excited but apprehensive. AI presents many opportunities, and it is already rapidly being adopted, but we still have a long way to go to ensure we implement it accurately, safely and securely. I am currently a PhD candidate in Cyber Security looking at adversarial machine learning – the ability to ‘hack’ machine learning models – and I hope that my research will further the field in AI Safety to ensure the inevitable rise of AI benefits, rather than harms, our communities.

Please give one or more examples of research interests relevant to AI existential safety:

Adversarial machine learning, Model evasion, Model optimization, Game theory, Cyber security, and AI bias and fairness.

University of Bath

Aishwarya Gurung

Why do you care about AI Existential Safety?

I think we should do everything we can to minimize the burden for future generations.

Please give one or more examples of research interests relevant to AI existential safety:

I am broadly interested in Artificial General Intelligence Safety, AI Governance and forecasting future tech-security issues relevant for policymaking.

School for Advanced Studies - University of Udine (Italy)

Michele Campolo

Why do you care about AI Existential Safety?

Due to the great potential that technological development and AI itself might unlock for humanity, reducing existential risk is mandatory. Studying the risks posed by AI allows us to understand possible future scenarios and increase the chances of a positive outcome.

Please give one or more examples of research interests relevant to AI existential safety:

At AI Safety Camp 2020 I worked in a team on the topic of goal-directedness. In the AI Alignment community, it has been argued that goal-directed agents might pose a greater risk than other agents whose behaviour is not strongly driven by external goals. We investigated this claim, and produced a literature review on goal-directedness, published on the Alignment Forum. At CEEALAR I have spent two years studying AI Alignment itself and the literature on AGI, to better understand the main characteristics of systems that possess general intelligence, their potential benefits, and the role they might play in catastrophic scenarios. I am currently working on an AI Alignment project that takes inspiration from some philosophical ideas in the field of metaethics. Even though the relation between metaethics and AI Alignment has been recognised by many different thinkers, some of whom were also guests at the AI Alignment Podcast by FLI, there are not many projects on this topic in the field of AI Safety. When completed, the research project might show a new path to AI Alignment, enabling the design of artificial intelligence that is aligned not only with human values, but all sentient life.

National Institute of Technology Karnataka

Allan Suresh

Why do you care about AI Existential Safety?

I’ve always wanted to do work that positively affects our future. Up until a year ago, my goal was to work to use AI in research that could be beneficial to climate change. It was then I came across the effective altruism forum. As I learned more about longtermism outside of climate change, I began to realise that my skills could be developed and put to better use in the field of AI Safety, more so because the field itself is talent-limited. Also, I feel that AI Safety is the most pressing problem among long term risk issues.

Please give one or more examples of research interests relevant to AI existential safety:

I am currently in GCRI’s Research Collaboration and Mentorship Program right now, doing a project under Seth Baum. Currently my interests mostly lie in Value Learning and Inverse Reinforcement Learning, and also Deep Reinforcement Learning.

Santa Clara University

Brian Green

Why do you care about AI Existential Safety?

I care about AI existential safety because human life, culture, and civilization are worth protecting. I have dedicated my life to helping people make better decisions, especially in the realm of technology and AI. Humankind is rapidly growing in power and this growth poses a fundamentally ethical problem: how should we use this power? What are the proper and improper uses of this power? Artificial intelligence takes the most human of capacities – our intelligence – and casts it into the world as machines and programs that can act autonomously, in increasingly more general ways. Power gives us the capacity to act, intelligence tells us which actions might be desirable, and ethics can help to tell us which desirable actions might actually be good. As a technology ethicist at the Markkula Center for Applied Ethics I have worked directly and extensively with the Partnership on AI, the World Economic Forum, the Vatican, fellow academics, and corporations together worth well over $4 trillion. My goal is to equip people with tools for thinking about ethical problems, so they can make better decisions related to technology in general, and AI in particular, and through their decisions create a better future.

Please give one or more examples of research interests relevant to AI existential safety:

I have very broad interests in the topic of AI existential safety and I engage the topic through four main paths. The first path is direct: I work on issues immediately relevant to problems and solutions on AI risk and safety. The second path is training current AI practitioners, future practitioners, and others involved in training and educating practitioners, in practical ethical tools for technology. This includes considering questions on the full spectrum of safety from short to long term. The third path is an adaptation strategy towards risk rather than a mitigation strategy, and involves creating refuges from existential risks, both on and off of the Earth. My fourth path is through cultural institutions such as the Vatican and other organizations which need to learn more about AI and the dangers and opportunities that it poses to the world. I have worked extensively with the Pontifical Council for Culture which seeks to promote a broad cross-cultural dialogue on AI.

University of Tartu

Mykyta Baliesnyi

Why do you care about AI Existential Safety?

Like most people, I want humanity to keep existing and live better with time. AIX risk is one of the key threats to that desire, both on its own and as an amplifier of other threats. On the flipside, resolving it is hence a unique opportunity to do great good for the world.

Please give one or more examples of research interests relevant to AI existential safety:

Here are two examples of research directions I am interested in:

(a) More efficient reward learning. A key approach for learning the reward function in RL is inverse reinforcement learning (IRL) . Recent work on combining learning from demonstrations with active preference elicitation has improved upon IRL sample efficiency by using only a few demonstrations as a prior, and then ”honing in” on the true reward function by asking preference queries. It would be interesting to frame the setup as a meta-learning problem, where the final reward recovered with costly preference learning would be used to improve the prior during training, reducing the number of preference queries necessary at test time.

(b) Task specification in Language models. With language models ever-growing in size and capabilities, aligning them with our intentions efficiently is an important urgent problem. Large LMs can often be steered to be helpful more effectively by the use of in-context prompts rather than through fine-tuning ; but the prompt itself occupies useful space in the transformer memory, putting a restriction on how much we can influence its behavior. There has been great progress in this direction, e.g. by using distillation of the prompts back into the model , but there is still a lot of space for exploration. For example, it would be interesting to distill larger prompts iteratively in small parts that fit in memory, to leverage smaller but very high-quality demonstrations.

Amanda Askell et al. A General Language Assistant as a Laboratory for Alignment. 2021. arXiv: 2112.00861 .
Erdem Bıyık et al. Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences. 2021. arXiv: 2006.14091 .
Yanda Chen et al. Meta-learning via Language Model In-context Tuning. 2021. arXiv: 2110.07814 .
Andrew Y. Ng and Stuart Russell. Algorithms for Inverse Reinforcement Learning. 2000.

OPEN FOR APPLICATIONS