Transcript: Concrete Problems in AI Safety with Dario Amodei and Seth Baum

Ariel Conn:  From the FLI Audio Files. I’m Ariel Conn with the Future of Life Institute.

This summer I watched all four of the White House Symposia on AI Research, which were designed to give AI researchers and others in the field a chance to discuss what the status of AI research is now, how it can help move society forward, and what risks we need to figure out how to address. There were many, many hours of talks and discussions, and I couldn’t help but notice how often researchers got up and were very adamant that talks should only be about short-term AI issues. They insisted that long-term AI concerns weren’t something that needed to be addressed right now. They didn’t want to worry about advanced AI, and they definitely didn’t want to think about superintelligence. But then, as soon as they started talking about their research, many of the issues that came up were related to things like control, and transparency, and bias.

Now, these are obviously issues that need to be addressed for short-term narrow AI, but there are also issues that need to be addressed for long-term, more advanced AI. And so I started to wonder why we were so worried about focusing on short-term vs. long-term artificial intelligence. Now somewhat famously in the AI world at least, Andrew Ng—who is with Baidu—has compared worrying about artificial long-term intelligence issues to worrying about overpopulation on Mars.

And I guess my reaction is that overpopulation is an issue, and it’s an issue that we need to address, and if we can solve overpopulation issues now so that we don’t have to worry about them later… why would we not do that? I realize there are probably some cost issues or other reasons that planning ahead is more difficult, but it seems like a really strange stance to me that we shouldn’t try to solve a current problem now, so that it doesn’t crop up again in the future.

I figured the  best way to try to understand what’s happening would be to turn to two people in the field. So I have with me Dario Amodei—who had been working at Google Brain, where he published a well-received paper titled Concrete Problems in AI Safety—and he’s recently moved to Open AI, which is focused on AI research and safety issues, as well as understanding the social implications of AI development. I also have Seth Baum—Executive Director of the Global Catastrophic Risk Institute. Much of Seth’s own research is on insuring AI safety. Seth, I’ll turn this over to you now.

Seth Baum:  Thanks Ariel. And these are really good questions, and we saw it at the White House Symposia. I attended one of those, but we also see it in a lot of different conversations across the AI world. And I think it’s a really important one, because while we might have some reasons for being especially interested in or concerned about the long-term AI issues, at the same time a lot of people are just more focused on short-term questions, and so it’s fair to ask what we can do to get action that works for both long-term and short-term AI issues. That’s why I think it’s good to have this conversation. It’s especially good to be having this conversation with Dario Amodei, who has recently published a paper on this topic, or rather, on the topic of AI safety in general and the issues that come up in AI safety, whether it’s short-term or long-term safety. Now, Dario, I want to get to the paper in a second, but first just a little background for our listeners: maybe you could say a little bit about how you became interested in the AI safety topic in the first place?

Dario Amodei:  Sure. I’ve actually been working in the AI field itself for a couple of years, and before coming to open AI I worked both at Baidu and Google, and I was drawn to the field by what I see as the incredible advances—particularly in deep neural networks—to solve problems in vision, speech, and language. I was involved in research in all of these areas, and found it very exciting, but one thing I definitely noticed—particularly with deep neural networks and with powerful ML systems in general—is that there are many ways they can be very accurate. A speech recognizer can tell you something and it can be almost as good as a human, but they can also be somewhat brittle. If I train with speech data on human speaking with a clean background with unaccented American speakers, they’ll perform great. But then if I test the same on accented speech or noisy data, it performs terribly.

And as the systems get deployed more into the world, having systems that fail unpredictably is not a good thing. And I think that impression was reinforced as I continued my work at Google, where you have issues with Google photo systems—which was based on a neural net classifier—that ended up accidentally classifying people of color as gorillas, which of course is an incredibly offensive thing to do. Now the neural net didn’t know that it was offensive, it was a combination of a problem with the classifier and a problem with the training data, but that machines can lack context for something and the classifier that’s produced by the machines can be something that has very bad real world impacts. If it does something that is not what we intended for it to do, that can be very harmful.

I think in the last year I have become particularly interested in reinforcement learning, which is a branch of machinery that’s concerned with interacting with the environment in a more intertwined way and is often used in things like robotics, self-driving cars, and autonomous systems. We’ve seen there was a recent announcement from Google that it’s used to control the power in their data centers. So once you’re actually interfacing with the world directly and controlling direct physical things, I think the potential for things to go wrong—which often I think are quite mundane—starts to increase. So I became more and more interested in if there were principle ways to reduce the risk or find some theoretical basis for guaranteeing that something bad won’t happen as we start to deploy these systems more into the world.

That’s kind of where my interest in AI safety started, and certainly there is the thought that these systems are advancing very quickly, and the more powerful they get the higher the stakes are for something that might potentially go wrong. So as someone who really wants to think about the social impacts of what I’m working on, this seemed like a really important area.

Seth Baum:  That makes a lot of sense, and it does seem that if you look at the systems themselves and what they’re doing, then it naturally follows that they can fail in these types of ways. And we saw it with the Google gorilla incident, which is kind of a classic example of, as you put it, an AI system failing predictably. A lot of my research is on the risk and especially the policy end of AI, and the same issue comes up in that context, because who do you hold liable? Do you really hold liable the company or the computer programmers who built this code? Because they didn’t want to do that! So for the legal management of these types of software it’s a challenge, because you don’t want to hold these people liable for intentionally causing these harms, yet at the same time these are systems that are almost by design bound to behave in unpredictable ways. And we’re just going to see this happen more and more. My impression at least is that that is a little part of the motivation for your paper. You wrote a paper recently called Concrete Problems in AI Safety that seems to take on some of these topics. Maybe you could say a little more about the paper itself and how you wanted to write it. How it contributes to these issues.

Dario Amodei:  Yeah, so as I got more and more interested in these safety issues, I did what any researcher does and looked into the literature that had been written in the machine learning literature so far about these problems. And there actually was a fair amount of literature about various subsets of this sort of thing that I was worried about… maybe there wasn’t a substantial literature, but there was, I would say, kind of a staggered nature to it, where we get into some of the specific problems that I talk about in the paper. But the four or five different problems that I was kind of thinking in my head—of ways to classify things that could go wrong in an ML system—were often written in parts of the literature that were not very much related to one another, and often were very specific to particular applications.

So I felt that what would help a lot is something that is sort of a combination of reviewing all the existing work that has been done in one place, and also having an agenda that talks about what needs to be done in the future. In particular, a lot of this work had been done quite a while ago, and so really writing this review and agenda with view towards the cutting-edge ways in which machine learning has advanced in the last three or four years with neural nets doing vision, speech, language, game playing, autonomous driving, and a bunch of other applications. I felt like a lot of the thinking about making systems safe, controllable, robust, and predictable could really use an update in light of these advances.

Then there was another stream, where for a while I’d been aware of the work of people like Nick Bostrom and Eliezer Yudkowsky, who come from outside the ML field, and have been warning about these very long-term considerations involving AIs that are smarter than humans. I read their work, and my attitude towards it has always been that I found it quite thought provoking, but if as a researcher I wanted to think about ways in which machine learning systems can go wrong, it is very important that we—for now—stick to the kind of scenarios that have been built today. Then if future scenarios do come up, we’re better equipped to think about those scenarios. So between those two poles, that there was a lot of existing literature that I felt needed to be drawn together a little bit, and that there was more of the high-level, kind of visionary thinking about the far future, I felt that there was a middle space of thinking in a principle but much more concrete way about the systems that we’re building now or likely to build in the next few years, and what general lessons can we learn about how ML systems go wrong.

That was the thought with which we sat down and wrote the paper, and it turned out that it was not just me, but there were several other authors in the paper. My main co-author was Christopher Olah—also from the Google Brain team—who’s done a lot of work on visualization and blogs for teaching machine learning to a wide audience. And we had some collaborators from Berkeley, Paul Christiano from Stanford, Jacob Steinhardt from Open AI before I joined, John Shulman, and another Google Brain researcher Dan Mane. We all found that we have the same general perspective and vision on the paper relative to what happened in this space before. We all worked together and spent a bunch of time, and eventually we produced this paper.

Seth Baum:  Okay, so perhaps you could take us into the paper a little bit. There are five concrete problems: avoiding negative side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift. These make sense when you get into the paper, but to just hear them spoken like this is kind of ambiguous. Maybe we can go one at a time through each of these five problems, and you can explain in basic terms what they mean, starting with “avoiding negative side effects.”

Dario Amodei:  So actually, before going into the five problems in detail, one frame that I thought of as a little useful for thinking about the problem that we describe in the paper is that we actually split the five problems into three general categories. I’ve always found it useful to start thinking about those categories, and then getting into the problems.

If you’re trying to build a machine learning system, one of the most important pieces is the objective function, which really defines the goal of the system or a way of judging whether it’s doing what you want it to do. For example, if you’re building a system that is supposed to recognize speech, then you might measure what fraction of words it gets correct vs. incorrect—the word error rate. If you’re building an image classification system, the objective function is the fraction of time that it identifies the images in the correct class. If you’re building a system to play Go—like the AlphaGo system—then it’s the fraction of the games you win, the probability that you win a game. So when thinking about building the machine learning system, and for whatever reason the machine behaves in some way that you didn’t intend and don’t like, one of the ways I found useful is to think about exactly where in the process things go wrong. The first place that things go wrong is if your objective function wasn’t actually right. I’ll go into a little more detail on that later, but the idea is that you’re putting pressure on the machine learning system to do a particular thing, and rewarding it for doing a particular thing. Unbeknownst to you, that was actually the wrong thing to do. The system then ends up behaving in a harmful way, because you had in mind that it would behave a certain way, you tried to formalize that with the objective function, and the objective function was the wrong objective function.

The second class is if you do know the objective function, but it’s very expensive to evaluate it. This might be human judgement, or we have a limited budget for a human supervising and checking in on the AI system, and the system has to extrapolate. It might do the wrong thing because it hasn’t really seen what the correct objective function is.

The third thing is when our system has the right objective function, but our machine learning system—as indicated by the phrase “machine learning”— has to learn. And there is a concern that while the system is in the process of learning, when it doesn’t understand the world around it to the best that it can, it might do something harmful while it’s not fully trained. That for me, from the perspective of a researcher, has been a natural way to decompose the problems. So now maybe we can go into the five problems.

Seth Baum:  Before we dive into the five problems, let me try speaking these back at you to make sure I understood them correctly. The first one is just the wrong objective function, that is, you gave it goals that in retrospect turn out to be goals that you’re not happy with. It fulfills those goals, but you wish that it was fulfilling some other goals.

Dario Amodei:  Correct.

Seth Baum: Now the second one, you said the objective function is expensive to evaluate. That means, you gave it good goals and it is working towards those goals, but it’s struggling, and instead of fulfilling those goals It’s doing something else that’s causing a problem.

Dario Amodei:  Right, because it has a limited ability to assess what the right goal is. So, the right goal might be to make the human happy with everything that I’m doing, but I can’t in every little action ask the human if he’s happy with what I’m doing. And so I might need some kind of cheaper proxy that predicts whether a human is happy. And if that goes wrong, then the system could do something unpredictable.

Seth Baum:  Okay, then the third one is when problems occur during the training process… maybe you could say what this training process is and how can we get harmed during it?

Dario Amodei: Sure, maybe we’re going into those first two problems as a sub-category, but the general idea is that if you had to think of a child, when they don’t really understand the world around them. Say, if they press this button on the stove a fire will turn on that could burn them or burn someone else. The child might be using the right process to learn about the world, but if they’ve never touched the stove before they don’t know what might happen and they don’t know if they might hurt someone.

Seth Baum:  Okay, makes sense to me. Should we go ahead and dive into the five?

Dario Amodei: Yeah, so the first one you asked about was “avoiding negative side effects.” This is one problem under the sub-category of having the wrong objective function. One way to introduce this is to say that in some sense machine learning systems are very literal minded. If you tell them to do X, they’ll do exactly X. The example we give in the paper: let’s say I have a cleaning robot that is trying to move a box from one side of a room to another, and in this very simple example all it tries to do is move the box. I might give it an objective function that basically says, “you get points (a reward) for moving the box from one side to the other,” and that’s all that matters to me. But if you give it literally just that one objective, then actually you’re implicitly telling it that anything else in its environment it doesn’t care about. If a vase is in its path, then it doesn’t in any way in terms of its objective function get penalized for knocking over this vase. It can just walk from one side of the room to the other and may just knock over the vase and not care about it.

You can generalize this to having a robot, and it’s really very focused on accomplishing a particular task—moving a particular thing—but the world is big and humans, when we walk around performing tasks, have to be very sure that when we drive our children to school that we don’t run someone over with the car or do something else. We are never just explicitly doing one task. I’m always doing one task and making sure that the rest of the world is okay, that I don’t do anything really damaging. As a human I have common sense and I know this, but our machinery systems at the present at least are not at the point where they have common sense. And so I see a potential for various things to go wrong in this case.

Seth Baum:  That really sounds like the classic Genie story. Right down to the genie in the bottle, you rub it, you get your wish. And it gives you your wish exactly what you asked for whether you like it or not. What you’re saying is that a machine-learning AI system has that same sort of taking things very literally

DS:  Or could. At least the systems that we build today often have that property. I mean, I’m hopeful that someday we’ll be able to build systems that have more of a sense of common sense. We talk about possible ways to address this problem, but yeah I would say it is like this Genie problem. For this specific case, one of the specific things that can go wrong is that the world is big and that it’s very easy when you’re training the machine learning system to focus on only a small aspect of it, and that gives you a whole class of things that can go wrong.

Seth Baum: Okay sounds good. Let’s move on to the second one: “avoiding reward hacking.” What’s that all about? So, reward hacking is this kind of situation where you write down an objective function, and it turns out that your objective function was trying to capture the ability to do something hard, or you felt like in order to achieve this objective function, we need to do some hard task, that is, the task you’re trying to get the machine learning system to do. But often there is some way of cheating the objective function that’s been written down. So the example given in the paper is if you have a clean robot that’s kind of trying to clean up all the messes it can see, and you decide to give it an objective function that says, “well how much dirt can you see around you?” You might think that would be a good measure of how clean the environment is. Then the cleaning robot—if it were designed a certain way—could just decide to close its eyes, or it could decide to kind of shovel a bunch of messes under a desk or into a closet.

And in fact this isn’t limited to machine learning systems. It’s a problem we have with humans as well, right? If I hire a cleaner, most cleaners are honest, but you know in theory if I didn’t check and I hired a dishonest cleaner, they might find it easier to just shove all the messages in my house into some closet that they think I won’t look in. And you know, again, there’s this thing that machine learning systems can be literal-minded at least in the way we design them today, and there’s all types of things that can go wrong. In the paper we discuss some general factors that lead to this, and one factor is when the robot with the machine learning system is not able to see everything in its environment, it’s very easy for the wrong kind of objective function to give incentives to hide aspects of the environment from itself for others—like the shoveling things into the closet case. We discuss a few of the general ways this can happen and a few thoughts and recommendations for designing objective functions where this is less likely to happen.

Seth Baum:  I’m reminded of those cute pictures you see on the Internet of kids taking quizzes, and it will be something like this long, difficult math problem, and below it says write the answer here and the student just writes the words “the answer” below it and tries to pass it in because they don’t know how to actually do the math problem.

Dario Amodei:  Yeah

Seth Baum:  We need to get our AI systems to not behave like mischievous little kids.

Dario Amodei:  Yeah there was the blogger who wrote about our work—I think it was Cory Doctorow—and he said a lot of the problems that I’ve read in this paper remind me of issues in child development in child psychology, and there is a sense in which a lot of these systems are kind of like savants, right? They’re like small children who don’t have enough common sense to know how do quite the right thing, but at the same time they are very voracious learners who can process a lot of information. So I do see a lot of commonalities there.

Seth Baum:  Makes sense. Okay, so let’s move on. The next one is scalable oversight.

Dario Amodei:  Yeah so the kind of the example we get there is let’s say that we have a cleaning robot again, and it’s trying to clean up a room but there are some objects in the room which might belong to a human. There’s a surefire way to get the right objective function, which is for every time you find an object, you ask a human. You ask the human, “does this object belong to you?” You ask every human who could possibly own it, “does this object belong to you?” You always do the right thing, but a robot that does that is impractical, and no one would sell any robots to do that. if that’s going to happen I might as well just clean it myself. I don’t want my robot asking me questions every two minutes while it’s trying to clean my house. So can we find ways where the robot maybe only asks me the first two times and gets a good sense of what kind of stuff I would actually own versus what kind of stuff it’s okay to throw away. Maybe it looks at cues of where I leave the stuff. And so the way to state the problem is: if the robot tries to do this, there is the risk that it will throw away things that I really would have wanted. And the solution is: are there ways we can get the robot to think about—from repeated experience—being able to predict the true objective function, which is what really belongs to me and what I really want thrown away without actually having to ask me every time, which might then destroy the economic value of the system.

Seth Baum:  That seems like something that humans face all the time, right? You’re cleaning, and do you really know whether they want that thrown away or not? Sometimes it’s a candy wrapper. That should be pretty obvious, unless it’s maybe a candy wrapper with a winning lottery ticket printed on the inside, or some prize competition that the candy company had. But it seems really easy for robots or AI systems to make the same sorts of mistakes.

Dario Amodei:  Yeah

Seth Baum:  Do you think it’s particularly difficult to train an AI to get those sorts of questions right? That seems like, maybe I don’t know the systems well enough, but that seems like something that we should be able to train an AI to figure out without too much difficulty.

Dario Amodei:  In the particular case of the cleaning robot, I’m pretty optimistic. But I think, you know it’s designed more to be kind of a parable that illustrates, that often there are aspects of human preferences that are quite subtle, and getting all of them right without an unacceptable amount of communication with humans, or halting the workflow of the machine learning system can often be quite subtle. A human might be able to look at something like a concert ticket and if the day for the concert ticket was yesterday, then they might just know that it was okay to throw it away. But if it was tomorrow then they would say, “oh, that’s really valuable, this is something someone’s going to use.” So there’s just a lot of subtle things like that that I think take some work to get right.

Seth Baum:  Okay, sure. Now let’s move. The next one is “safe exploration.” What’s that?

Dario Amodei:  So this is actually a problem that’s been worked on a lot in the machine learning community, and so our work here was more to summarize prior work, and also point towards how work in this area could be integrated with a lot of advances that we’re seeing in robotics, and whether it’s possible to step up the reach of work in this area.

The basic idea here is that—particularly in reinforcement learning, which I mentioned earlier is the branch of machine learning that deals with systems that interact with the environment in a very intertwined wave—there’s a trade-off between exploring and exploiting; between doing the thing that I think is best right now and understanding my environment better, which might lead to me understanding it better and realizing that there are even better things that I can do. But the problem is, that when I’m exploring an unknown environment, often there are aspects that I’ve never dealt with before, and so I can do something dangerous without knowing what I’m doing.

The example gave with the cleaning robot is, you know, maybe it’s never seen an electrical outlet before. And so it wants to experiment with cleaning strategies and tries to stick a wet mop in the electrical outlet. Obviously this is just going to be really bad for the robot. Another example that has actually come up with actual robots people have built is robot helicopters. The idea is if I want to train my robot helicopter how to fly properly; I want to use reinforcement learning to learn how to fly. One problem we can have is that, you know, if it’s experimenting with spinning propellers, and doesn’t really understand the dynamics of flying very well… if it does something bad and ends up crashing, it could break its propeller or break it’s control system or something, and then you can’t use the robot helicopter anymore, right? The system is broken and you need to get a new one, and the designer for the system won’t be very happy. And yet the system needs to learn somehow, and so again this is a problem children encounter, right? Children need to try things on their own to understand what works and what doesn’t, but it’s also very important that they don’t do things that are truly dangerous that they couldn’t recover from if it goes wrong. And you know to some extent children have an instinct for this, and it’s part of the role of parents to keep children from going into truly dangerous situations. But it’s something that our machine learning systems currently grapple with, and I think are going to need to grapple with more and more.

Seth Baum:  That seems like something that we all grapple with on a pretty regular basis. For myself, as an academic I’m constantly worrying about whether I’m spending too much time thinking about stuff, and researching and learning more and so on versus just going ahead and writing what I have and forming an opinion and getting out there and saying what I have to say on the topic… and it’s a dilemma, right, how hard we try to figure things out before we do things. In the tech community they have the old Facebook saying “move fast and break things,” which for some contexts works well, but for other contexts does not work so well. And actually Facebook has changed. It’s now “move fast with stable infrastructure,” something that sounds more responsible and not nearly as catchy. So yeah I guess an AI would have to face the same sorts of issues, right?

Dario Amodei: Yeah, I mean I think the problem of machine learning and AI is to get machines to do some of the same tasks that humans do. So in some sense I think it’s not surprising that in doing some of the same task humans do they run into a lot of the same problems that humans do.

Seth Baum:  Okay so we got one more of the concrete problems, and it’s called “robustness to distributional shift.”

Dario Amodei:  This is the idea that often if you have a machine learning system, we have the notion of kind of training data and test data. A machine learning system often get trained on one particular type of data, but then when it’s deployed in the real world it often finds itself in situations or faced with data that might be different from the data that it was trained on. Our example with the robot is: let’s say we have a robot that’s been trained to clean factory work floors. It’s kind of learned that you should use harsh chemicals to do this, and that it needs to avoid lots of metal obstacles. You then deploy it in an office, and it might engage in some behavior that’s inappropriate. It might use chemicals that are too harsh, it might not understand how the office is setup, etc., etc.

I think another example of this actually is the gorilla example that occurred with Google a year ago, where one of the problems with that photo captioning app was that a lot of its training data had been trained on caucasian individuals, and it had seen monkeys, but it had never seen an individual with different skin color. So it made a very inappropriate inference based on insufficient training data. Our interest in robustness to distributional shift is in trying to both detect and remedy situations where you’re seeing something that is different than what you’ve seen before. The photo caption system should have said, “this is something that doesn’t actually look like anything that I’ve seen before or any of the classes that I’ve seen before—it’s actually something different, and I should be very careful of what class I assign this to, because I don’t have high confidence about the situation and I’m aware that I’m facing data that’s different from the data that I was trained on.” It’s not only possible to respond appropriately to a totally new situation or totally new perception that I might receive, but it seems like it is possible to recognize that what I’m seeing is different from what I’ve seen before. So that’s when the paper discusses how to recognize that and how to be appropriately cautious once you recognize it.

Seth Baum:  It’s really interesting to me just listening to you talk about these different challenges of designing an AI to behave in ways we would want to behave. How similar to me at least it sounds like child development and human behavior and challenges that we all face. It makes me feel like these artificial intelligence systems are already not so different from us, at least in these types of ways.

Dario Amodei:  Well I would definitely say that certainly the systems that we are building today are very limited systems. I do want to emphasize that we are nowhere near building systems that can replicate the incredible range of behaviors that humans are capable of. However I would say that within the particular tasks that we assign to machine learning systems… yeah I think so many of the problems that they face in learning those specific tasks often—not always—have analogies to the challenges in human space in learning those tasks.

Seth Baum:  Okay, so they’re still not at all as capable as we are across the board, but still face some of the same challenges we do.

Dario Amodei:  Yeah.

Seth Baum:  Okay, very good. So I want to bring it back to the conversation we started out with, which is on what we saw at the White House symposium, what we see in other contexts, about this short-term AI versus long-term AI issues and people caring more about short-term than long-term, and so on. With that in mind, I’m curious what the reaction has been to your paper, right? Do people say, “oh this is crazy,” or do these ideas seem reasonable to them?

Dario Amodei:  Actually the reaction has been extremely positive more so than I even anticipated. There’s a few different communities that read and looked at our work. The first was, because it was published on the Google research blog and ended up getting covered by the media, and their reaction was actually quite positive. Most of the stories had titles like: “Google Gets Practical About AI Concerns,” or “Google Addressing AI Concerns Responsibly.” The idea was that it was kind of a set of serious engineers who are really sitting down to think about just very specifically what can go wrong in machine learning systems, and how can we prevent those things from happening so that machine learning systems can benefit everyone. It was very that, and not the kind of alarmist “Terminator robots are going to kill us all” kind of thing. So I feel the media understood actually pretty well, which surprised me a little bit.

Seth Baum:  What about from the AI community?

Dario Amodei:  Yeah, so the AI community was also extremely positive. Even people like Oren Etzioni—who have been vocal spokespeople against more long-term concerns about AI and risks of AI— you know, they were very positive about this paper. Oren Etzioni was quoted in one of the news articles as saying, “These are the right people asking the right questions.” And I think a lot of the doubts that people like Etzioni have had about long-term AI risk has just been a kind of vagueness: well how do you work on this? I think the reaction to our paper was very positive because it gave the problem posed in a way where you can actually sit down and write a paper on it, and actually, by the way, we intend to follow up on the paper, this was more kind of an agenda paper, but we intend to follow it up by writing papers trying to address these problems in actual real systems. And you know that’s one of the things I’m working on at open AI, and we’re going to continue to collaborate with Google on it. I think that the concreteness part of it and the practicality and the promise of real empirical work that I hope we can deliver on made a lot of the I community seem pretty excited about it.

And then finally there’s the community of people that have been worried about long-term AI risks. I think the reaction there was pretty positive as well, even though the focus of the paper was on shorter term issues as Ariel pointed out. The beginning of this, conceptually a lot of the longer term risks that they’re worried about can be seen as instances of some of the problems we’ve talked about, in particular the negative side effects and reward hacking, but actually all of them are things that—when I try to think about what someone like Nick Bostrom is talking about—I think what they’re talking about is the kind of problems were talking about, the concrete problems paper… if you have those problems with an extremely powerful AI system that is even more powerful than humans, then I think that’s how you get to some of the scenarios that Bostrom is describing. I definitely have a disagreement with the AI safety community, which is, it’s not that I don’t think we may face these extreme scenarios eventually, and I’m glad there’s someone thinking about them, but I at least am most interested in thinking about problems that we can attack empirically today. And I hope that those problems that we attack today will start to shed light on the longer-term issues that we have.

Again, if we really work on things like reward hacking and avoiding negative side effects, I think if we work on them in the right way, there will be a lot of relevance to the scenarios that people worried about AI risk are worried about. Eventually many of the things they’re talking about, the things they write about, maybe they’ll become very relevant someday. But my difference is more tactical;  I just see a great deal of importance to having the empirical feedback loop. To saying, “this is a problem I think the system might have, let me test it… oh it has this part of the problem but not this part of the problem, but let me do another iteration on it.” Just in research and science in general I feel we’ve got a lot of mileage out of the empirical feedback loop, and so that’s something that I emphasize a lot.

Seth Baum:  I’m really glad to hear that the response has been so positive. This seems to me like the sort of clever solution that we need for problems like AI safety, that can resonate across seemingly disparate audiences. We’ve had a lot of disagreement between the people who are worried about the future of superintelligence risk versus the AI researchers who are out there building new systems today, and my impression is that the difference of opinion between these two groups only goes so far, in that both of them as far as I can tell genuinely do care about the social impact of their work. It might not be the core focus of everyone’s attention, and I think this is an issue that needs to be addressed within the AI community.

Stewart Russell has a great line about how the AI community needs to take social impacts more seriously, he compares it to civil engineers, he says no one in civil engineering talks about building bridges that don’t fall down, they just they just call it building bridges, right? Because in civil engineering everyone takes for granted that the social impact of their work really matters, whereas in AI (according to Stuart Russell) that’s less the case… but the impression I have listening to AI researchers, a lot of them do actually care about the social impacts of their work, they’re just not sure about this superintelligence thing. It’s remote into the future, it’s speculative, maybe even sounds a little odd and it’s just so far removed from the systems that they’re working on. So to see opportunities to address the General AI safety concerns—that may also be relevant to superintelligence but are very much relevant to the systems that people are building today—it makes sense to me that they would respond positively to that sort of message.

And I wonder if there’s anybody especially within the AI research communities that is pushing back against it, saying, “no, we should just be focused on building AI systems that are more capable and we shouldn’t worry about these safety problems.” Have you gotten that at all?

Dario Amodei:  I don’t think I’ve ever had someone say that specifically to me. I think there is probably a healthy debate to some extent in the machine learning community about which social impacts we should care the most about. Some of my colleagues are very interested—and I am actually, too—in things like the economic impact of machine learning systems, or in fairness. I think definitely people differ in how much they choose to focus on each of these issues, but I haven’t really encountered anyone who says, “we shouldn’t think about any of these issues,” or who says, “this is the only issue that we should think about.” I think, when properly explained, the risk of AI systems doing things that we didn’t intend… everyone says, “yes, that’s something you should prevent.” The risk of AI systems treating people unfairly, that’s something “yes we should prevent.” The risk of bad economic impacts of AI everyone says, “yes that is something we should prevent.” Internet security issues that could arise with AI everyone says, “yes that’s something we should definitely prevent.” Different people are interested in working on these to different extents, and for some people this isn’t a personal research interest to them, but I actually haven’t found anyone who says, “no, I don’t think anyone should work on these things.” Maybe such people do exist, but I haven’t met any.

Seth Baum:  Maybe that’s the bigger challenge with this? It’s not people who actively push back against work on these problems, but people who just essentially ignore it. I remember from my own engineering days—and I really like to bring up social impacts and social issues related to our research—my fellow engineers would listen to me and then they would basically be like “Okay, that’s nice. Now get back to work,” because in their minds, thinking about the social aspect of it was someone else’s job, and so it’s easy to imagine AI researchers not really disagreeing with the sorts of things that you’re saying, but just thinking that maybe this is somebody else’s responsibility to worry about. Do you see that at all?

Dario Amodei:  I have definitely heard people say that, but to be fair I don’t have a huge objection to some fraction—maybe even a large fraction of the field—having that attitude. I mean, I think research is a process of specialization, and not everyone can work on everything. If your attitude is “I just want to make a better speech system. I know that machine learning has social impacts, and someone else is working on that.” If a decent fraction of the field takes that attitude, I’m fine with it. My concern is more that we, as a field collectively, that we are on the issue, that we have enough people within the field who do want to think about these issues. If there’s no one or too few people in the field who want to think about these issues, then I think that’s a problem, because I think it’s our responsibility as researchers to think about the impact of the research that we’re doing. But if a particular person says, “that’s not my cup of tea, that’s not my focus area,” I’m fine with that. I think that’s the way research works. But I would say that right now the fraction of people doing this, at least a year ago, I would say was too low. Now I think thankfully we’re starting to get more and more people into this and maybe getting to a healthier place.

Seth Baum:  Okay, that was what I was going to ask you. Because I presume based on the fact that you’re speaking up on this topic, that there’s a sense there should be more work going on on this? It seems like the paper you wrote was, as you put it, an agenda—a call for action, a call for research on these topics. It’s very encouraging to hear that you think that the fraction of people working on these safety problems is going up. Would you still say that it should be going up more? Or do you think that we’re actually reaching a pretty comfortable place right now?

Dario Amodei:  I mean, it’s all kind of coming into place. Since I joined OpenAI, I’ve had a number of people say, “I’m interested in these topics; I want to work on these topics.” The new people coming into the ML field and people who have been in there in a while. So I actually don’t know where things will end up, and my main goal is just to get some good technical research done on these topics. Then we’ll see if there needs to be more people in the field. And my hope is the usual dynamic: there’s a lot of interesting results found in one place, then more people come into the field. Then if there’s too many people working on something, some people go somewhere else. I’m hopeful that those normal dynamics will get us to a place where we’re thinking responsibly. That may be too optimistic, but that’s my hope.

Seth Baum:  That makes sense to me, and let’s hope that things do balance out there. In my experience, researchers don’t necessarily always gravitate towards just where the right topics are, what research most needs to be done, and you end up with too many people crowding in one seemingly popular area. But we’ll see. I’m really glad to hear this, and it would be great if these safety problems could then just solve themselves as more AI researchers work on them.

Dario Amodei:  Yeah, that would be my hope for what would happen.

Seth Baum:  Okay, thank you. Any final thoughts you’d like to add to this conversation before we sign off?

Dario Amodei:  You know I think my perspective is that empirical and testable work on unintended consequences of machine learning systems is the best way to illuminate these problems and figure out where to go next.

Seth Baum: Okay, thank you.

Ariel Conn:  I want to thank you both for sitting down and having this discussion. I think this helps shed a lot of light, at least on the issues I saw at the White House symposia. And it’s been a really great overview of where we’re at with AI safety research today. So Dario and Seth, thank you very much.

Seth Baum:  Thank you.

Dario Amodei:  Thank you for having me.



New Center for Human-Compatible AI

Congratulations to Stuart Russell for his recently announced launch of the Center for Human-Compatible AI!

The new center will be funded, primarily, by a generous grant from the Open Philanthropy Project for $5,555,550. The center will focus on research around value alignment, in which AI systems and robots will be trained using novel methods to understand what a human really wants, rather than just relying on initial programming.

Russell is most well known as the co-author of Artificial Intelligence: A Modern Approach, which has become the standard textbook for AI students. However, in recent years, Russell has also become an increasingly strong advocate for AI safety research and ensuring that the goals of artificial intelligence align with the goals of humans.

In a statement to FLI, Russell (who also sits on the FLI Science Advisory Board) said:

“I’m thrilled to have the opportunity to launch a serious attack on what is — as Nick Bostrom has called it — ‘the essential task of our age.’ It’s obviously in the very early stages but our work (funded previously by FLI) is already leading to some surprising new ideas for what safe AI systems might look like. We hope to find some excellent PhD students and postdocs and to start training the researchers who will take this forward.”

An example of this type of research can be seen in a paper published this month by Russell and other researchers on Cooperative Inverse Reinforcement Learning (CIRL). In inverse reinforcement learning, the AI system or robot has to learn a human’s goals by observing the human in a real-world or simulated environment, and CIRL is a potentially more effective method for teaching the AI to achieve this. In a press release about the new center, the Open Philanthropy Project listed other possible research avenues, such as:

  • “Value alignment through, e.g., inverse reinforcement learning from multiple sources (such as text and video).
  • “Value functions defined by partially observable and partially defined terms (e.g. ‘health,’ ‘death’).
  • “The structure of human value systems, and the implications of computational limitations and human inconsistency.
  • “Conceptual questions including the properties of ideal value systems, tradeoffs among humans and long-term stability of values.”

Other funders include the Future of Life Institute and the Defense Advanced Research Projects Agency, and other co-PIs and collaborators include:

  • Pieter Abbeel, Associate Professor of Computer Science, UC Berkeley
  • Anca Dragan, Assistant Professor of Computer Science, UC Berkeley
  • Tom Griffiths, Professor of Psychology and Cognitive Science, UC Berkeley
  • Bart Selman, Professor of Computer Science, Cornell University
  • Joseph Halpern, Professor of Computer Science, Cornell University
  • Michael Wellman, Professor of Computer Science, University of Michigan
  • Satinder Singh Baveja, Professor of Computer Science, University of Michigan

In their press release, the Open Philanthropy Project added:

“We also believe that supporting Professor Russell’s work in general is likely to be beneficial. He appears to us to be more focused on reducing potential risks of advanced artificial intelligence (particularly the specific risks we are most focused on) than any comparably senior, mainstream academic of whom we are aware. We also see him as an effective communicator with a good reputation throughout the field.”

Podcast: Concrete Problems in AI Safety with Dario Amodei and Seth Baum

Many researchers in the field of artificial intelligence worry about potential short-term consequences of AI development. Yet far fewer want to think about the long-term risks from more advanced AI. Why? To start to answer that question, it helps to have a better understanding of what potential issues we could see with AI as it’s developed over the next 5-10 years. And it helps to better understand the concerns actual researchers have about AI safety, as opposed to fears often brought up in the press.

We brought on Dario Amodei and Seth Baum to discuss just that. Amodei, who now works with OpenAI, was the lead author on the recent, well-received paper Concrete Problems in AI Safety. Baum is the Executive Director of the Global Catastrophic Risk Institute, where much of his research is also on AI safety.

Not in a good spot to listen? You can always read the transcript here.

If you’re still new to or learning about AI, the following terminology might help:

Artificial Intelligence (AI): A machine or program that can learn to perform cognitive tasks, similar to those achieved by the human brain. Typically, the program, or agent, is expected to be able to interact with the real world in some way without constant supervision from its creator. Microsoft Office is considered a computer program because it will do only what it is programmed to do. Siri is considered by most to be a very low-level AI because it must adapt to its surroundings, respond to a wide variety of owners, and understand a wide variety of requests, not all of which can be programmed for in advance. Levels of artificial intelligence fall along a spectrum:

  • Narrow AI: This is an artificial intelligence that can only perform a specific task. Siri can look up anything on a search engine, but it can’t write a book or drive a car. Google’s self-driving cars can drive you where you want to go, but they can’t cook dinner. AlphaGo can beat the world’s best Go player, but it can’t play Monopoly or research cancer. Each of these programs can do the program they’re designed for as well as, or better than humans, but they don’t come close to the breadth of capabilities humans have.
  • Short-term AI concerns: The recent increase in AI development has many researchers concerned about problems that could arise in the next 5-10 years. Increasing autonomy will impact the job market and potentially income inequality. Biases, such as sexism and racism, have already cropped up in some programs, and people worry this could be exacerbated as AIs become more capable. Many wonder how we can ensure control over systems after they’ve been released for the public, as seen with Microsoft’s problems with its chatbot Tay. Transparency is another issue that’s often brought up — as AIs learn to adapt to their surroundings, they’ll modify their programs for increased efficiency and accuracy, and it will become increasingly difficult to track why an AI took some action. These are some of the more commonly mentioned concerns, but there are many others.
  • Advanced AI and Artificial General Intelligence (AGI): As an AI program expands its capabilities, it will be considered advanced. Once it achieves human-level intelligence in terms of both capabilities and breadth, it will be considered generally intelligent.
  • Long-term AI concerns: Current expectations are that we could start to see more advanced AI systems within the next 10-30 years. For the most part, the concerns for long-term AI are similar to those of short-term AI, except that, as AIs become more advanced, the problems that arise as a result could be more damaging, destructive, and/or devastating.
  • Superintelligence: AI that is smarter than humans in all fields.

Agent: A program, machine, or robot with some level of AI capabilities that can act autonomously in a simulated environment or the real world.

Machine Learning: An area of AI research that focuses on how the agent can learn from its surroundings, experiences, and interactions in order to improve how well it functions and performs its assigned tasks. With machine learning, the AI will adapt to its environment without the need for additional programming. AlphaGo, for example, was not programmed to be better than humans from the start. None of its programmers were good enough at the game of Go to compete with the world’s best. Instead, it was programmed to play lots of games of Go with the intent to win. Each time it won or lost a game, it learned more about how to win in the future.

Training: These are the iterations a machine-learning program must go through in order learn how to better meet its goal by making adjustments to the program’s settings. In the case of AlphaGo, training involved playing Go over and over.

Neural Networks (Neural Nets) and Deep Neural Nets: Neural nets are programs that were inspired by the way the central nervous system of animals processes information, especially with regard to pattern recognition. These are important tools within a machine learning algorithm that can help the AI process and learn from the information it receives. Deep neural nets have more layers of complexity.

Reinforcement Learning: Similar to training a dog. The agent receives positive or negative feedback for each iteration of its training, so that it can learn which actions it should seek out and which it should avoid.

Objective Function: This is the goal of the AI program (it can also include subgoals). Using AlphaGo as an example again, the primary objective function would have been to win the game of Go.

Terms from the paper, Concrete Problems in AI Safety, that might not be obvious (all are explained in the podcast, as well):

  • Reward Hacking: When the AI system comes up with an undesirable way to achieve its goal or objective function. For example, if you tell a robot to clean up any mess it sees, it might just throw away all messes so it can’t see them anymore.
  • Scalable Oversight: Training an agent to solve problems on its own without requiring constant oversight from a human.
  • Safe Exploration: Training an agent to explore its surroundings safely, without injuring itself or others and without triggering some negative outcome that could be difficult to recover from.
  • Robustness to distributional shifts: Training an agent to adapt to new environments and to understand when the environment has changed so it knows to be more cautious.

MIRI August 2016 Newsletter

Research updates

General updates

  • Our 2015 in review, with a focus on the technical problems we made progress on.
  • Another recap: how our summer colloquium series and fellows program went.
  • We’ve uploaded our first CSRBAI talks: Stuart Russell on “AI: The Story So Far” (video), Alan Fern on “Toward Recognizing and Explaining Uncertainty” (video), and Francesca Rossi on “Moral Preferences” (video).
  • We submitted our recommendations to the White House Office of Science and Technology Policy, cross-posted to our blog.
  • We attended IJCAI and the White House’s AI and economics event. Furman on technological unemployment (video) and other talks are available online.
  • Talks from June’s safety and control in AI event are also online. Speakers included Microsoft’s Eric Horvitz (video), FLI’s Richard Mallah (video), Google Brain’s Dario Amodei (video), and IARPA’s Jason Matheny (video).

News and links

See the original newsletter on MIRI’s website.

Op-ed: Education for the Future – Curriculum Redesign


“Adequately preparing for the future means actively creating it: the future is not the inevitable or something we are pulled into.”

What Should Students Learn for the 21st Century?

At the heart of ensuring the best possible future lies education. Experts may argue over what exactly the future will bring, but most agree that the job market, the economy, and society as a whole are about to see major changes.

Automation and artificial intelligence are on the rise, interactions are increasingly global, and technology is rapidly changing the landscape. Many worry that the education system is increasingly outdated and unable to prepare students for the world they’ll graduate into – for life and employability.

Will students have the skills and character necessary to compete for new jobs? Will they easily adapt to new technologies?

Charles Fadel, founder of the Center for Curriculum Redesign, considers six factors – three human and three technological – that will require a diverse set of individual abilities and competencies, plus an increased collaboration among cultures. In the following article, Fadel explains these factors and why today’s curriculum may not be sufficient to prepare students for the future.


Human Factors

First, there are three human factors affecting our future: (1) increased human longevity, (2) global connectivity, and (3) environmental stresses.

Increased Human Longevity

The average human lifespan is lengthening and will produce collective changes in societal dynamics, including better institutional memory and more intergenerational interactions.  It will also bring about increased resistance to change. This may also lead to economic implications, such as multiple careers over one’s lifespan and conflicts over resource allocation between younger and older generations. Such a context will require intergenerational sensitivity and a collective systems mindset in which each person balances his or her personal and societal needs.

Global Connectivity

The rapid increase in the world’s interconnectedness has had many compounding effects, including exponential increase in the velocity of the dissemination of information and ideas, with more complex interactions on a global basis. Information processing has already had profound effects on how we work and think. It also brings with it increased concerns and issues about data ownership, trust, and the overall attention to and reorganization of present societal structures. Thriving in this context will require tolerance of a diversity of cultures, practices, and world views, as well as the ability to leverage this connectedness.

Environmental Stresses

Along with our many unprecedented technological advances, human society is using up our environment at an unprecedented rate, consuming more of it and throwing more of it away. So far, our technologies have wrung from nature an extraordinary bounty of food, oil, and materials. Scientists calculate that humans use approximately “40 percent of potential terrestrial [plant] production” for themselves (Global Change, 2008). What’s more, we have been mining the remains of plants and animals from hundreds of millions of years ago in the form of fossil fuels in the relatively short period of a few centuries. Without technology, we would have no chance of supporting a population of one billion people, much less seven billion and climbing.

Changing dynamics and demographics will, by necessity, require greater cooperation and sensitivity among nations and cultures. Such needs suggest a reframing of notions of happiness beyond a country’s gross domestic product (a key factor used in analyses of cultural or national quality of life) (Revkin, 2005) and an expansion of business models to include collaboration with a shared spirit of humanity for collective well-being. It also demands that organizations possess an ability to pursue science with an ethical approach to societal solutions

Three Technology Factors

Three technology factors will also condition our future: (1) the rise of smart machines and systems, (2) the explosive growth of data and new media, and (3) the possibility of amplified humans.

The Rise of Smart Machines and Systems

While the creation of new technologies always leads to changes in a society, the increasing development and diffusion of smart machines—that is, technologies that can perform tasks once considered only executable by humans—has led to increased automation and ‘offshorability’ of jobs and production of goods. In turn, this shift creates dramatic changes in the workforce and in overall economic instability, with uneven employment. At the same time, it pushes us toward overdependence on technology—potentially decreasing individual resourcefulness. These shifts have placed an emphasis on non-automatable skills (such as synthesis and creativity), along with a move toward a do-it-yourself maker economy and a proactive human-technology balance (that is, one that permits us to choose what, when, and how to rely on technology).

The Explosive Growth of Data and New Media

The influx of digital technologies and new media has allowed for a generation of “big data” and brings with it tremendous advantages and concerns. Massive data sets generated by millions of individuals afford us the ability to leverage those data for the creation of simulations and models, allowing for deeper understanding of human behavioral patterns, and ultimately for evidence-based decision making.

At the same time, however, such big data production and practices open the door to privacy issues, concerns, and abuses. Harnessing these advantages, while mitigating the concerns and potential negative outcomes, will require better collective awareness of data, with skeptical inquiry and a watchfulness for potential commercial or governmental abuses of data.

The Possibility of Amplified Humans

Advances in prosthetic, genetic, and pharmacological supports are redefining human capabilities while blurring the lines between disability and enhancement. These changes have the potential to create “amplified humans.” At the same time, increasing innovation in virtual reality may lead to confusion regarding real versus virtual and what can be trusted. Such a merging shift of natural and technological requires us to reconceptualize what it means to be human with technological augmentations and refocus on the real world, not just the digital world.


Curricula worldwide have often been tweaked, but they have never been completely redesigned for the comprehensive education of knowledge, skills, character, and meta-learning.

21st century education

In a rapidly changing world, it is easy to get focused on current requirements, needs, and demands. Yet, adequately preparing for the future means actively creating it: the future is not the inevitable or something we are pulled into. There is a feedback loop between what the future could be and what we want it to be, and we have to deliberately choose to construct the reality we wish to experience. We may see global trends and their effects creating the ever-present future on the horizon, but it is up to us to choose to actively engage in co-constructing that future.

For more analysis of the question and implications for education, please see:


Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we post op-eds that we believe will help spur discussion within our community. Op-eds do not necessarily represent FLI’s opinions or views.

Effective Altruism 2016

The Effective Altruism Movement

Edit: The following article has been updated to include more highlights as well as links to videos of the talks.

How can we more effectively make the world a better place? Over 1,000 concerned altruists converged at the Effective Altruism Global conference this month in Berkeley, CA to address this very question. For two and a half days, participants milled around the Berkeley campus, attending talks, discussions, and workshops to learn more about efforts currently underway to improve our ability to not just do good in the world, but to do the most good.

Those who arrived on the afternoon of Friday, August 5 had the opportunity to mingle with other altruists and attend various workshops geared toward finding the best careers, improving communication, and developing greater self-understanding and self-awareness.

But the conference really kicked off on Saturday, August 6, with talks by Will MacAskill and Toby Ord, who both helped found the modern effective altruistism movement. Ord gave the audience a brief overview of the centuries of science and philosophy that provided the base for effective altruism. “Effective altruism is to the pursuit of good as the scientific revolution is to the pursuit of truth,” he explained. Yet, as he pointed out, effective altruism has only been a real “thing” for five years.

Will MacAskill

Will MacAskill introduced the conference and spoke of the success the EA movement has had in the last year.

Toby Ord speaking about the history of effective altruism.

Toby Ord spoke about the history of effective altruism.


MacAskill took the stage after Ord to highlight the movement’s successes over the past year, including coverage by such papers as the New York Times and the Washington Post. And more importantly, he talked about the significant increase in membership they saw this year, as well as in donations to worthwhile causes. But he also reminded the audience that a big part of the movement is the process of effective altruism. He said:

“We don’t know what the best way to do good is. We need to figure that out.”

For the rest of the two days, participants considered past charitable actions that had been most effective, problems and challenges altruists face today, and how the movement can continue to grow. There were too many events to attend them all, but there were many highlights.

Highlights From the Conference

When FLI cofounder, Jaan Tallin, was asked why he chose to focus on issues such as artificial intelligence, which may or may not be a problem in the future, rather than mosquito nets, which could save lives today, he compared philanthropy to investing. Higher risk investments have the potential for a greater payoff later. Similarly, while AI may not seem like much of  threat to many people now, ensuring it remains safe could save billions of lives in the future. Tallin spoke as part of a discussion on Philanthropy and Technology.

Jaan Tallin speaking remotely about his work with EA efforts.

Jaan Tallin speaking remotely about his work with EA efforts.

Martin Reese, a member of FLI’s Science Advisory Board, argued that we are in denial of the seriousness of our risks. At the same time, he said that minimizing risks associated with technological advances can only be done “with great difficulty.”  He encouraged EA participants to figure out which threats can be dismissed as science fiction and which are legitimate, and he encouraged scientists to become more socially engaged.

As if taking up that call to action, Kevin Esvelt talked about his own attempts to ensure gene drive research in the wild is accepted and welcomed by local communities. Gene drives could be used to eradicate such diseases as malaria, schistosomiasis, Zika, and many others, but fears of genetic modification could slow research efforts. He discussed his focus on keeping his work as open and accessible as possible, engaging with the public to allow anyone who might be affected by his research to have as much input as they want. “Closed door science,” he added, “is more dangerous because we have no way of knowing what other people are doing.”  A single misstep with this early research in his field could imperil all future efforts for gene drives.

Kevin Esvelt talks about his work with CRISPR and gene drives.

Kevin Esvelt talks about his work with CRISPR and gene drives.

That same afternoon, Cari Tuna, President of the Open Philanthropy Project, sat down with Will McAskill for an interview titled, “Doing Philosophy Better,” which focused on her work with OPP and Effective Altruism and how she envisions her future as a philanthropist. She highlighted some of the grants she’s most excited about, which include grants to Give Directly, Center for Global Development, and Alliance for Safety and Justice. When asked about how she thought EA could improve, she emphasized, “We consider ourselves a part of the Effective Altruism community, and we’re excited to help it grow.” But she also said, “I think there is a tendency toward overconfidence in the EA community that sometimes undermines our credibility.” She mentioned that one of the reasons she trusted GiveWell was because of their self reflection. “They’re always asking, ‘how could we be wrong?'” she explained, and then added, “I would really love to see self reflection become more of a core value of the effective altruism community.”

cari tuna

Cari Tuna interviewed by Will McAskill (photo from the Center for Effective Altruism).

The next day, FLI President, Max Tegmark, highlighted the top nine myths of AI safety, and he discussed how important it is to dispel these myths so researchers can focus on the areas necessary to keep AI beneficial. Some of the most distracting myths include arguments over when artificial general intelligence could be created, whether or not it could be “evil,” and goal-oriented issues. Tegmark also added that the best thing people can do is volunteer for EA groups.

During the discussion about the risks and benefits of advanced artificial intelligence, Dileep George, cofounder of Vicarious, reminded the audience why this work is so important. “The goal of the future is full unemployment so we can all play,” he said. Dario Amodei of OpenAI emphasized that having curiosity and trying to understand how technology is evolving can go a long way toward safety. And though he often mentioned the risks of advanced AI, Toby Ord, a philosopher and research fellow with the Future of Humanity Institute, also added, “I think it’s more likely than not that AI will contribute to a fabulous outcome.” Later in the day, Chris Olah, an AI researcher at Google Brain and one of the lead authors of the paper, Concrete Problems in AI Safety, explained his work as trying to build a bridge to futuristic problems by doing empirical research today.

Moderator Riva-Melissa Tez, Dario Amodei, George Dileep, and Toby Ord at the Risks and Benefits of Advanced AI discussion.

Moderator Riva-Melissa Tez, Dario Amodei, Dileep George, and Toby Ord at the Risks and Benefits of Advanced AI discussion. (Not pictured, Daniel Dewey)

FLI’s Richard Mallah gave a talk on mapping the landscape of AI safety research threads. He showed how there are many meaningful dimensions along which such research can be organized, how harmonizing the various research agendas into a common space allows us to reason about different kinds of synergies and dependencies, and how consideration of the white space in such representations can help us find both unknown knowns and unknown unknowns about the space.

Tara MacAulay, COO at the Centre for Effective Altruism, spoke during the discussion on “The Past, Present, and Future of EA.” She talked about finding the common values in the movement and coordinating across skill sets rather than splintering into cause areas or picking apart who is and who is not in the movement. She said, “The opposite of effective altruism isn’t ineffective altruism. The opposite of effective altruism is apathy, looking at the world and not caring, not doing anything about it . . . It’s helplessness. . . . throwing up our hands and saying this is all too hard.”

MacAulay also moderated a panel discussion called, Aggregating Knowledge, which was significant, not only for its thoughtful content about accessing, understanding, and communicating all of the knowledge available today, but also because it was an all-woman panel. The panel included Sarah Constantin, Amanda Askell, Julia Galef, and Heidi McAnnaly, who discussed various questions and problems the EA community faces when trying to assess which actions will be most effective. MacAulay summarized the discussion at the end when she said, “Figuring out what to do is really difficult but we do have a lot of tools available.” She concluded with a challenge to the audience to spend five minutes researching some belief they’ve always had about the world to learn what the evidence actually says about it.

aggregating knowledge

Sarah Constantin, Amanda Askell, Julia Galef, Heidi McAnnaly, and Tara MacAulay (photo from the Center for Effective Altruism).

Prominent government leaders also took to the stage to discuss how work with federal agencies can help shape and impact the future. Tom Kalil, Deputy Director for Technology and Innovation highlighted how much of today’s technology, from cell phones to Internet, got its start in government labs. Then, Jason Matheny, Director of IARPA, talked about how delays in technology can actually cost millions of lives. He explained that technology can make it less costly to enhance moral developments and that, “ensuring that we have a future counts a lot.”

Tom Kalil speaks about the history of government research and its impact on technology.

Tom Kalil speaks about the history of government research and its impact on technology.

Jason Matheny talks about how employment with government agencies can help advance beneficial technologies.

Jason Matheny talks about how employment with government agencies can help advance beneficial technologies.

Robin Hanson, author of The Age of Em, talked about his book and what the future will hold if we continue down our current economic path while the ability to create brain emulation is developed. He said that if creating ems becomes cheaper than paying humans to do work, “that would change everything.” Ems would completely take over the job market and humans would be pushed aside. He explained that some people might benefit from this new economy, but it would vary, just as it does today, with many more people suffering from poverty and fewer gaining wealth.

Robin Hanson talks to a group about how brain emulations might take over the economy and what their world will look like.

Robin Hanson talks to a group about how brain emulations might take over the economy and what their world will look like.


Applying EA to Real Life

Lucas Perry, also with FLI, was especially impressed by the career workshops offered by 80,000 Hours during the conference. He said:

“The 80,000 Hours workshops were just amazing for giving new context and perspective to work. 80,000 Hours gave me the tools and information necessary to reevaluate my current trajectory and see if it really is best of all possible paths for me and the world.

In the end, I walked away from the conference realizing I had been missing out on something so important for most of my life. I found myself wishing that effective altruism, and organizations like 80,000 Hours, had been a part of my fundamental education. I think it would have helped immensely with providing direction and meaning to my life. I’m sure it will do the same for others.”

In total, 150 people spoke over the course of those two and a half days. MacAskill finally concluded the conference with another call to focus on the process of effective altruism, saying:

“Constant self-reflection, constant learning, that’s how we’re going to be able to do the most good.”


View from the conference.

View from the conference.

Developing Countries Can’t Afford Climate Change

Developing countries currently cannot sustain themselves, let alone grow, without relying heavily on fossil fuels. Global warming typically takes a back seat to feeding, housing, and employing these countries’ citizens. Yet the weather fluctuations and consequences of climate change are already impacting food growth in many of these countries. Is there a solution?

Developing Countries Need Fossil Fuels

Fossil fuels are still the cheapest, most reliable energy resources available. When a developing country wants to build a functional economic system and end rampant poverty, it turns to fossil fuels.

India, for example, is home to one-third of the world’s 1.2 billion citizens living in poverty. That’s 400 million people in one country without sufficient food or shelter (for comparison, the entire U.S. population is roughly 323 million people). India hopes to transition to renewable energy as its economy grows, but the investment needed to meet its renewable energy goals “is equivalent to over four times the country’s annual defense spending, and over ten times the country’s annual spending on health and education.”

Unless something changes, developing countries like India cannot fight climate change and provide for their citizens. In fact, developing countries will only accelerate global warming as their economies grow because they cannot afford alternatives. Wealthy countries cannot afford to ignore the impact of these growing, developing countries.

The Link Between Economic Growth and CO2

According to a World Bank report, “poor and middle-income countries already account for just over half of total carbon emissions.” And this percentage will only rise as developing countries grow. Achieving a global society in which all citizens earn a living wage and climate catastrophe is averted requires breaking the link between economic growth and increasing carbon emissions in developing countries.

Today, most developing countries that decrease their poverty rates also have increased rates of carbon emissions. In East Asia and the Pacific, the number of people living in extreme poverty declined from 1.1 billion to 161 million between 1981 and 2011—an 85% decrease. In this same time period, the amount of carbon dioxide per capita rose from 2.1 tons per capita to 5.9 tons per capita—a 185% increase.

South Asia saw similar changes during this time frame. As the number of people living in extreme poverty decreased by 30%, the amount of carbon dioxide increased by 204%.

In Sub-Saharan Africa, the number of people living in poverty increased by 98% in this thirty-year span, while carbon dioxide per capita decreased by 17%. Given the current energy situation, if sub-Saharan Africans are to escape extreme poverty, they will have to increase their carbon use—unless developing countries step in to offer clean alternatives.

Carbon Emissions Rate Vs. Total

Many wealthier countries have been researching alternative forms of energy for decades. And that work may be starting to pay off.

New data shows that, since the year 2000, 21 developed countries have reduced annual greenhouse gas emissions while simultaneously growing their economies. Moreover, this isn’t all related to a drop in the industrial sector. Uzbekistan, Bulgaria, Switzerland, and the Czech Republic demonstrated that countries do not need to shrink their industrial sectors to break the link between economic growth and increased greenhouse gas emissions.

Most importantly, global carbon emissions stalled from 2014 to 2015 as the global economy grew.

But is this rate of global decoupling fast enough to keep the planet from warming another two degrees Celsius? When emissions stall at 32.1 billion metric tons for two years, that’s still 64.2 billion metric tons of carbon being pumped into the atmosphere over two years.

The carbon emissions rate might fall, but the total continues to grow enormously. A sharp decline in carbon emissions is necessary to keep the planet at a safe global temperature. At the 2015 Paris Climate Conference, the United Nations concluded that in order to keep global temperatures from rising another two degrees Celsius, global carbon emissions “must fall to net zero in the second half of the century.”

In order to encourage this, the Paris agreement included measures to ensure that wealthy countries finance developing countries “with respect to both mitigation and adaptation.” For mitigation, countries are expected to abide by their pledges to reduce emissions and use more renewable energy, and for adaptation, the deal sets a global goal for “enhancing adaptive capacity, strengthening resilience and reducing vulnerability to climate change.”

Incentivizing R&D

One way wealthy countries can benefit both themselves and developing countries is through research and development. As wealthier countries develop cheaper forms of alternative energy, developing countries can take advantage of the new technologies. Wealthy countries can also help subsidize renewable energy for countries dealing with higher rates of poverty.

Yet, as of 2014, wealthy countries had invested very little in this process, providing only 0.2% of developing countries’ GDP for adaptation and mitigation. Moreover, a 2015 paper from the IMF revealed that while we spend $100 billion per year subsidizing renewable energy, we spend an estimated $5.3 trillion subsidizing fossil fuels. This fossil fuel subsidy includes “the uncompensated costs of air pollution, congestion and global warming.”

Such a huge disparity indicates that wealthy countries either need stronger incentives or stronger legal obligations to shift this fossil fuel money towards renewable energy. The Paris agreement intends to strengthen legal obligations, but its language is vague, and it lacks details that would ensure wealthy countries follow through with their responsibilities.

However, despite the shortcomings of legal obligations, monetary incentives do exist. India, for example, wants to vastly increase its solar power capacity to address this global threat. They need $100 billion to fund this expansion, which could spell a huge opportunity for U.S. banks, according to Raymond Vickery, an expert on U.S-India economic ties. This would be a boon for the U.S. economy, and it would set an important precedent for other wealthy countries to assist and invest in developing countries.

However, global leaders need to move quickly. The effects of global warming already threaten the world and the economies of developing countries, especially India.

Global Impact of Climate Change

India relies on the monsoon cycle to water crops and maintain its “nearly $370 billion agricultural sector and hundreds of millions of jobs.” Yet as the Indian Ocean has warmed, the monsoon cycle has become unreliable, resulting in massive droughts and dying crops.

Across the globe, scientists expect developing countries such as India to be hit hardest by rising temperatures and changes in rainfall. Furthermore, these countries with limited financial resources and weak infrastructure will struggle to adapt and sustain their economic growth in the face of changing climate. Nicholas Stern predicts that a two-degree rise in temperature would cost about 1% of world GDP. But the World Bank estimates that it would cost India 5% of their GDP.

Moreover, changes such as global warming act as “threat multipliers” because they increase the likelihood of other existential threats. In India, increased carbon dioxide emissions have contributed to warmer temperatures, which have triggered extensive droughts and increased poverty. But the problems don’t end here. Higher levels of hunger and poverty can magnify political tensions, potentially leading to conflict and even nuclear war. India and Pakistan both have nuclear weapons—if drought expands and cripples their economies, violence can more easily erupt.

Alternatively, wealthy nations could capitalize on investment opportunities in developing countries. In doing so, their own economies will benefit while simultaneously aiding the effort to reach net zero carbon emissions.

Global warming is, by definition, a global crisis. Mitigating this threat will require global cooperation and global solutions.

Analysis: Clopen AI – Openness in Different Aspects of AI Development

Clopen AI – which aspects of artificial intelligence research should be open, and which aspects should be closed, to keep AI safe and beneficial to humanity?

There has been a lot of discussion about the appropriate level of openness in AI research in the past year – the OpenAI announcement, the blog post Should AI Be Open?, a response to the latter, and Nick Bostrom’s thorough paper Strategic Implications of Openness in AI development.

There is disagreement on this question within the AI safety community as well as outside it. Many people are justifiably afraid of concentrating power to create AGI and determine its values in the hands of one company or organization. Many others are concerned about the information hazards of open-sourcing AGI and the resulting potential for misuse. In this post, I argue that some sort of compromise between openness and secrecy will be necessary, as both extremes of complete secrecy and complete openness seem really bad. The good news is that there isn’t a single axis of openness vs secrecy – we can make separate judgment calls for different aspects of AGI development, and develop a set of guidelines.

Information about AI development can be roughly divided into two categories – technical and strategic. Technical information includes research papers, data, source code (for the algorithm, objective function), etc. Strategic information includes goals, forecasts and timelines, the composition of ethics boards, etc. Bostrom argues that openness about strategic information is likely beneficial both in terms of short- and long-term impact, while openness about technical information is good on the short-term, but can be bad on the long-term due to increasing the race condition. We need to further consider the tradeoffs of releasing different kinds of technical information.

Sharing papers and data is both more essential for the research process and less potentially dangerous than sharing code, since it is hard to reconstruct the code from that information alone. For example, it can be difficult to reproduce the results of a neural network algorithm based on the research paper, given the difficulty of tuning the hyperparameters and differences between computational architectures.

Releasing all the code required to run an AGI into the world, especially before it’s been extensively debugged, tested, and safeguarded against bad actors, would be extremely unsafe. Anyone with enough computational power could run the code, and it would be difficult to shut down the program or prevent it from copying itself all over the Internet.

However, releasing none of the source code is also a bad idea. It would currently be impractical, given the strong incentives for AI researchers to share at least part of the code for recognition and replicability. It would also be suboptimal, since sharing some parts of the code is likely to contribute to safety. For example, it would make sense to open-source the objective function code without the optimization code, which would reveal what is being optimized for but not how. This could make it possible to verify whether the objective is sufficiently representative of society’s values – the part of the system that would be the most understandable and important to the public anyway.

It is rather difficult to verify to what extent a company or organization is sharing their technical information on AI development, and enforce either complete openness or secrecy. There is not much downside to specifying guidelines for what is expected to be shared and what isn’t. Developing a joint set of openness guidelines on the short and long term would be a worthwhile endeavor for the leading AI companies today.

(Cross-posted from my blog. Thanks to Jelena Luketina and Janos Kramar for their detailed feedback on this post.)

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we post op-eds that we believe will help spur discussion within our community. Op-eds do not necessarily represent FLI’s opinions or views.

Image source:

Op-ed: Being Alarmed Is Not the Same as Being an Alarmist

When the evidence clearly suggests that we’re heading toward a catastrophe, scientists shouldn’t hesitate to make their feelings known to the public. So, at what point should scientists begin to publicly worry about the environment?

Scientists are trained to report their findings in a disinterested manner. The aim is to be as objective as possible, and this means bracketing one’s feelings in favor of the facts.

But what happens when the evidence suggests that humanity is racing towards a global, irreversible disaster? What happens when the results of scientific inquiry clearly warrant activism in favor of a particular law or policy?

Once in a while, scientists do express their personal thoughts about the results of scientific research. For example, in 2012, a geophysics researcher from the University of San Diego, Brad Werner, gave a presentation at the large, annual American Geophysical Union conference. His talk was titled “Is Earth F**cked?,” and as he told a reporter for iO9 afterwards, the answer is “more or less.”

Two years later, after a group of scientists found “vast methane plumes escaping from the seafloor,” the glaciologist Jason Box echoed Werner’s pessimism, tweeting: “If even a small fraction of Arctic sea floor carbon is released to the atmosphere, we’re f ’d.”

Rewriting Records

There’s good reason for scientists to be honest and open about the implications of their research. The environmental situation today really is dire.

According to Gavin Schmidt of NASA’s Goddard Institute of Space Studies, there’s a 99% probability that 2016 will become the hottest year on record, surpassing the previous record set by 2015, which itself surpassed the previous record set by 2014. In fact, the hottest 16 years have all occurred since 2000, with only a single exception (1998).

Even more, last June was the 14th consecutive month to set a temperature record. And in July, Kuwait experienced the highest temperature ever recorded in the Eastern hemisphere, with temperatures reaching 129.2 degrees (F). In nearby Iraq, the mercury peaked at 129.0 degrees. As Jason Samenow notes, “It’s also possible that [the] 129.2-degree reading matches the hottest ever reliably measured anywhere in the world” (italics added).

Meanwhile, the amount of carbon dioxide in the atmosphere continues to climb at a meteoric rate. Before the Industrial Revolution, the concentration was 280 parts per million (ppm). But recent years have seen it surpass 400 ppm. Initially, this has occurred for only  part of the year because of the seasonal life cycles of plants, which remove atmospheric carbon dioxide.

Last year, though, the average concentration of carbon dioxide exceeded 400 ppm for the first time ever. And scientists are now saying that “carbon dioxide will never fall below 400 ppm this year, nor the next, nor the next.” In other words, no human alive today will ever again experience an atmosphere with less than 400 ppm. As the meteorologist Richard Betts puts it, “These numbers are … a reminder of the long-term effects we’re having on the system.”

Worrisome Weather

Along with record-breaking temperatures and changes to atmospheric chemistry, recent months have seen many extreme weather events. This is in part due to the 2015-2016 El Niño climate cycle, which has been “probably the most powerful in the last 100 years.”

But the more fundamental driver of extreme weather is climate change. Research shows that climate change will result in more severe floods, droughts, heat waves, and hurricanes. According to a study conducted by scientists at NASA, Cornell, and Columbia universities, we should expect “megadroughts” in the US lasting decades.

Another study predicts that certain regions could experience heat waves so scorching that “one would overheat even if they were naked in the shade, soaking wet and standing in front of a large fan.” Yet another report found that lightning strikes will increase by 50% this century.

Until recently, it was difficult for climatologists to link particular instances of extreme weather with human-caused changes to the climate. Asking whether climate change caused event X is like asking whether smoking caused Jack’s lung cancer. A doctor can explain that Jack-the-smoker is statistically more likely to get cancer than Jack-the-nonsmoker. However, a direct link is indiscernible.

But this situation is changing, as a recent report from the National Academy of Sciences affirms. Scientists are increasingly able to connect climate change with particular instances of extreme weather. And the results are worrisome.

For example, a study from last year links climate change to the 2007-2010 Syrian drought. This record-breaking event fueled the Syrian civil war by instigating a large migration of farmers into Syria’s urban centers. Furthermore, this conflict gave rise to terrorist groups like the Islamic State and Jabhat al-Nusra (al-Qaeda’s Syrian affiliate). In other words, one can trace an unbroken series of causes from climate change to the Syrian civil war to terrorism.

Panicking in Public

Climate change is a clear and present danger. Scientists don’t debate about whether it’s occurring. Nor do they disagree that its consequences will be global, catastrophic, and irreversible. According to the World Bank, “the global community is not prepared for a swift increase in climate change-related natural disasters — such as floods and droughts — which will put 1.3 billion people at risk by 2050.”

Given the high stakes and the well-established science, scientists should be waving their arms and shouting, “The situation is urgent! We must act now! The future of civilization depends upon it!” In the process, they should take care to distinguish between the distinct attitudes of “being alarmed” and “being an alarmist,” which many pundits, politicians, and journalists often conflate. The first occurs when one responds proportionally to the best available evidence. The second is what happens when one’s fear and anxiety go beyond the evidence.

Being alarmed is the appropriate response to an alarming situation, and the situation today really is alarming.

The ongoing catastrophe of climate change is not out of our control. But if we don’t act soon, Werner could be right that Earth is, well, in bad shape.

Note from FLI: Among our objectives is to inspire discussion and a sharing of ideas. As such, we post op-eds that we believe will help spur discussion within our community. Op-eds do not necessarily represent FLI’s opinions or views.