Podcast: Concrete Problems in AI Safety with Dario Amodei and Seth Baum

Published

30 August, 2016

Many researchers in the field of artificial intelligence worry about potential short-term consequences of AI development. Yet far fewer want to think about the long-term risks from more advanced AI. Why? To start to answer that question, it helps to have a better understanding of what potential issues we could see with AI as it's developed over the next 5-10 years. And it helps to better understand the concerns actual researchers have about AI safety, as opposed to fears often brought up in the press.

We brought on Dario Amodei and Seth Baum to discuss just that. Amodei, who now works with OpenAI, was the lead author on the recent, well-received paper Concrete Problems in AI Safety. Baum is the Executive Director of the Global Catastrophic Risk Institute, where much of his research is also on AI safety.

Not in a good spot to listen? You can always read the transcript here.

If you're still new to or learning about AI, the following terminology might help:

Artificial Intelligence (AI): A machine or program that can learn to perform cognitive tasks, similar to those achieved by the human brain. Typically, the program, or agent, is expected to be able to interact with the real world in some way without constant supervision from its creator. Microsoft Office is considered a computer program because it will do only what it is programmed to do. Siri is considered by most to be a very low-level AI because it must adapt to its surroundings, respond to a wide variety of owners, and understand a wide variety of requests, not all of which can be programmed for in advance. Levels of artificial intelligence fall along a spectrum:

Narrow AI: This is an artificial intelligence that can only perform a specific task. Siri can look up anything on a search engine, but it can’t write a book or drive a car. Google’s self-driving cars can drive you where you want to go, but they can’t cook dinner. AlphaGo can beat the world’s best Go player, but it can’t play Monopoly or research cancer. Each of these programs can do the program they’re designed for as well as, or better than humans, but they don’t come close to the breadth of capabilities humans have.
Short-term AI concerns: The recent increase in AI development has many researchers concerned about problems that could arise in the next 5-10 years. Increasing autonomy will impact the job market and potentially income inequality. Biases, such as sexism and racism, have already cropped up in some programs, and people worry this could be exacerbated as AIs become more capable. Many wonder how we can ensure control over systems after they’ve been released for the public, as seen with Microsoft’s problems with its chatbot Tay. Transparency is another issue that’s often brought up -- as AIs learn to adapt to their surroundings, they’ll modify their programs for increased efficiency and accuracy, and it will become increasingly difficult to track why an AI took some action. These are some of the more commonly mentioned concerns, but there are many others.
Advanced AI and Artificial General Intelligence (AGI): As an AI program expands its capabilities, it will be considered advanced. Once it achieves human-level intelligence in terms of both capabilities and breadth, it will be considered generally intelligent.
Long-term AI concerns: Current expectations are that we could start to see more advanced AI systems within the next 10-30 years. For the most part, the concerns for long-term AI are similar to those of short-term AI, except that, as AIs become more advanced, the problems that arise as a result could be more damaging, destructive, and/or devastating.
Superintelligence: AI that is smarter than humans in all fields.

Agent: A program, machine, or robot with some level of AI capabilities that can act autonomously in a simulated environment or the real world.

Machine Learning: An area of AI research that focuses on how the agent can learn from its surroundings, experiences, and interactions in order to improve how well it functions and performs its assigned tasks. With machine learning, the AI will adapt to its environment without the need for additional programming. AlphaGo, for example, was not programmed to be better than humans from the start. None of its programmers were good enough at the game of Go to compete with the world’s best. Instead, it was programmed to play lots of games of Go with the intent to win. Each time it won or lost a game, it learned more about how to win in the future.

Training: These are the iterations a machine-learning program must go through in order learn how to better meet its goal by making adjustments to the program’s settings. In the case of AlphaGo, training involved playing Go over and over.

Neural Networks (Neural Nets) and Deep Neural Nets: Neural nets are programs that were inspired by the way the central nervous system of animals processes information, especially with regard to pattern recognition. These are important tools within a machine learning algorithm that can help the AI process and learn from the information it receives. Deep neural nets have more layers of complexity.

Reinforcement Learning: Similar to training a dog. The agent receives positive or negative feedback for each iteration of its training, so that it can learn which actions it should seek out and which it should avoid.

Objective Function: This is the goal of the AI program (it can also include subgoals). Using AlphaGo as an example again, the primary objective function would have been to win the game of Go.

Terms from the paper, Concrete Problems in AI Safety, that might not be obvious (all are explained in the podcast, as well):

Reward Hacking: When the AI system comes up with an undesirable way to achieve its goal or objective function. For example, if you tell a robot to clean up any mess it sees, it might just throw away all messes so it can’t see them anymore.
Scalable Oversight: Training an agent to solve problems on its own without requiring constant oversight from a human.
Safe Exploration: Training an agent to explore its surroundings safely, without injuring itself or others and without triggering some negative outcome that could be difficult to recover from.
Robustness to distributional shifts: Training an agent to adapt to new environments and to understand when the environment has changed so it knows to be more cautious.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we interview researchers and thought leaders who we believe will help spur discussion within our community. The interviews do not necessarily represent FLI’s opinions or views.

Podcast

Related episodes

If you enjoyed this episode, you might also like:

17 July, 2026

Podcast: Concrete Problems in AI Safety with Dario Amodei and Seth Baum

Related episodes

Why AI Evaluations Are Broken and How to Fix Them (with David Manheim)

How AI Is Replacing Children’s Ability to Think (with Randi Weingarten)

The Gulf Between AI Progress and Political Understanding (with Dex Hunter-Torricke)

How AI Companions Trap Users Through Addictive Design (with Claire Boine)

Sign up for the Future of Life Institute newsletter