Predicting the Future (of Life)

It’s often said that the future is unpredictable. Of course, that’s not really true. With extremely high confidence, we can predict that the sun will rise in Santa Cruz, California at 7:12 am local time on Jan 30, 2016. We know the next total solar eclipse over the U.S. will be August 14, 2017, and we also know there will be one June 25, 2522.

We can predict next year’s U.S. GDP to within a few percent, what will happen if we jump off a bridge, the Earth’s mean temperature to within a degree, and many other things. Just about every decision we make implicitly includes a prediction for what will occur given each possible choice, and much of the time we make these predictions unconsciously and actually pretty accurately. Hide, or pet the tiger? Jump over the ravine, or climb down? Befriend, or attack? Our mind contains a number of systems comprising quite a sophisticated prediction engine that keeps us alive by helping make good choices in a complex world.

Yet we’re often frustrated at our inability to predict better. What does this mean, precisely? It’s useful to break prediction accuracy into two components: resolution and calibrationResolution measures how close your predictions are to 0% or 100% likelihood. Thus the prediction that a fair coin will land heads-up 50% of the time has very low resolution. However, if the coin is fair, this prediction has excellent calibration, meaning that the prediction is very close to the relative frequency of heads vs. coin flips in a long series of trials. On the other hand, many people made confident predictions that they “would win” (i.e. probability 1) the recent $1.5 Billion Powerball lottery. These predictions had excellent resolution but terrible calibration.  

When we say we can’t predict the future, there are generally a few different things we mean. Sometime we have excellent calibration but poor resolution. A good blackjack player knows the odds of hitting 21, but fun of the game (and the Casino’s profits) relies on nobody having better resolution than the betting odds provide. Weather reports provide a less fun example of generally excellent calibration with resolution that is never as good as we would like. While lack of resolution is frustrating, having good calibration allows good decision-making because we can compute reliable expected values based on well-calibrated probabilities.

Insurance companies don’t know who is going to get in an accident, but can set sensible policies and costs based on statistical probabilities. Startup investors can value a company based on expected future earnings that combine the (generally low) probability of success with the (generally high) value if successful. And Effective Altruists can try to do the most expected good — or try to mitigate the most expected harm — by weighting the magnitude of various occurrences by the probability of their occurrence.

What are much much less useful are predictions with poor, unknown, or non-existent calibration. These, alas, are what we get a lot of the from pundits, from self-appointed (or even real) experts, and even in many quantitive predictive studies. I find this maximally frustrating in discussions of catastrophic or existential risk.  For example:

Concerned Scientist:  I’m a little worried that unprecedentedly rapid global warming could lead to a runway greenhouse effect and turn Earth into Venus.

Unconcerned Expert: Well, I don’t think that’s very likely.

Concerned Scientist: Well, if it happens, all of humanity and life on Earth, and all humans who might ever live over countless eons will be extinguished. What would you say the probability is?

Unconcerned Expert: Look, I really don’t think it’s very likely.

Concerned Scientist: 0.0001%?  1%?  10%?

Unconcerned Expert: Look, I really don’t think it’s very likely.

Now, if you try to weigh this worry against:

“I’m a little worried that one of the dozens of near-miss nuclear accidents likely to happen over the next twenty years will lead to a nuclear war.”


“I’m a little worried that an apocalyptic cult could use CRISPR and published gain-of-function research to bioengineer a virus to cleanse the Earth of all humans.”


“I’m a little worried that a superintelligent AI could decide that human decision-makers are too large a source of uncertainty and should be converted into paperclips for predictable mid-level bureaucrats to use.”

Then you can see that the Unconcerned Expert’s response is not useful: it provides no guidance whatsoever into what level of resources should be targeted at reducing this risk, either absolutely or relative to other existential risks.

People do not generally have a good intuition for small probabilities, and to be fair, computing them can be quite challenging.  A small probability generally suggests that there are many, many possible outcomes, and reliably identifying, characterizing, and assessing these many outcomes is very hard.  Take the probability that the US will become engaged in a nuclear war.  It’s (we hope!!) quite small.  But how small?  There are many routes by which a nuclear war might happen, and we’d need to identify each route, break each route into components, and then assign probabilities to each of these components.  For example, inspired by the disturbing sequence of events envisioned in this Vox article, we might ask:

1: Will there be significant ethnic Russian protests in Estonia in the next 5 years?

2: If there are protests, will NATO significantly increase its military presence there?

3: If there are protests, do 10 or more demonstrators die in the protests?

4: If there are increased NATO forces and violent protests, does violence escalate into a military conflict?


Each of these component questions is much easier to address, and together can indicate a reasonably well-calibrated probability for one path toward nuclear conflict. This is not, however, something we can generally do ‘on the fly’ without significant thought and analysis.

What if we do put the time and energy into assessing these sequences of possibilities? Assigning probabilities to these chains of mutually exclusive possibilities would create a probability map of a tiny portion of the landscape of possible futures. Somewhat like ancient maps, this map must be highly imperfect, with significant inaccuracies, unwarranted assumptions, and large swathes of unknown territory. But a flawed map is much better than no map!

Some time back, when pondering how great it would be to have a probability map like this, I decided it would require a few ingredients. 

First, it would take a lot of people combining their knowledge and expertise. The world — and the set of issues at hand — is a very complex system, and even enumerating the possibilities, let alone assigning likelihoods to them, is a large task. Fortunately, there are good precedents for crowdsourced efforts: Wikipedia, Quora, Reddit, and other efforts have created enormously valuable knowledge bases using the aggregation of large numbers of contributions.

Second, it would take a way of identifying which people are really really good at making predictions.  Many people are terrible at it — but finding those who excel at predicting, and aggregating their predictions, might lead to quite accurate ones. Here also, there is very encouraging precedent. The Aggregative Contingent Estimation project run by IARPA, one component of which is the Good Judgement Project, has created a wealth of data indicating that (a) prediction is a trainable, identifiable, persistent skill, and (b) by combining predictions, well-calibrated probabilities can be generated for even complex geopolitical events.  

Finally, we’d need a system to collect, optimally combine, calibrate, and interpret all of the data. This was the genesis of the idea for Metaculus, a new project I’ve started with several other physicists. Metaculus is quickly growing and evolving into a valuable tool that can help humanity make better decisions. I encourage you to check out the open questions and predictions, and make your own forecasts. It’s fun, and even liberating to stop thinking of the future as unknowable, and to start exploring it instead!



7 replies
  1. Mindey
    Mindey says:

    Idea of looks fine and needed, but the current implementation is boring. It would be fun to give interval estimates instead of “Yes/No” statements regarding statements like “X is more likely than ¬X?” Yes, by assembling a larger crowd, you could get good approximations from “Yes/No” answers, but by asking this, you are collecting just a small fraction of information that a human could provide about his/her belief regarding an event.

  2. Anthony Aguirre
    Anthony Aguirre says:

    HI Mindey,

    Right now what Metaculus asks for is a ‘probability’ (1%-99%) of ‘yes’. We definitely plan to expand this in the future to

    (a) multiple outcomes, where you would assign a probability to each outcome (and even a continuous set of outcomes), and
    (b) to contingent questions of the form “given A, what are the probabilities that B1, B2, B3 or B4 occurring.”

    We could in principle allow people to express “I think the probability is between 25 and 40%”, but I don’t think that would add that much.

    We could also allow people to say “I think the value will be between X and Y”. This would then be a special case of multiple outcomes, where you give a finite probability to outcomes between X and Y, and zero probability outside this range. So I think that will be included when we expand to multi-outcome questions.

    We’re hoping that some of the fun parts will be explaining/arguing about your prediction (you can post your explanation with your prediction) and also actually playing the ‘game’ and seeing how well your predictions do versus reality. If you have other additional ideas for making it more exciting, we’d love to hear!

  3. Kptn Blizz
    Kptn Blizz says:

    the whole chitchat on prediction is pretty nonscientific and way more the domain of clairvoyants, futurists and of course venture capitalists, who aim to promote their business, than sound science which has serious interests in understanding the dynamics of complex adaptive systems of our environment. sound science would never claim to predict the future but describe conceivable scenarios that might happen if all assumptions come true that represent conditions necessary for the scenario to occur. most assumptions necessary lie ahead itself.

    • Kptn Blizz
      Kptn Blizz says:

      p.s. Here’s the formal structure of sound statements on future events:
      Given the known data D and accepted theories T the future event E will occur – if also the assumptions A are true, which cannot completely be verified at present time.

      • Anthony
        Anthony says:

        Kptn Blizz,

        I basically agree with your ‘formal structure’, though I would add that (a) all predictions are fundamentally statistical, and (b) relatedly, you never have perfect knowledge of D or T, or full confidence of A. So all predictions are somewhat imprecise. In some cases (such as eclipses), the prediction window is so peaked and narrow that it takes the form of a single prediction, i.e. fantastic ‘resolution.’

        In Metaculus (or anything similar), the user is in effect supplying much of D, T, and A individually. This is in fact what we do in most predictions we make on a minute-by-minute basis. This is not nearly as accurate as the when we make predictions based on mathematical theories, consensus assumptions, and experimentally-gathered data. But it can nonetheless be quite effective. I’d encourage you to look at the literature for the IAPRA ACE program, e.g.

        Oh, and if there are clairvoyants out there, they are especially invited to try their hand at Metaculus — they should be able to rack up a massive track record.

        Futurists and Venture Capitalists are also of interest, though.

  4. Lawrence Chan
    Lawrence Chan says:

    How is this different from Tetlock’s Good Judgment Project/other similar prediction tournaments that came from IARPA ACE?

    • Anthony Aguirre
      Anthony Aguirre says:

      Metaculus drew a lot of inspiration from the IARPA ACE publications! Good Judgement open is pretty similar in current implementation. Both are somewhat different from many predictions ‘markets’ with real (or pretend) currency. The key differences are, I think,

      (a) Metaculus is much more focused on scientific and technical issues, whereas GJ (and many other markets) are more focused on politics, geopolitical events, sports, finance, etc. Metaculus will likely expand in the future, but its core will, I expect, always be more focused on scientific and technical issues. It’s design, feel, and attitude all reflect this orientation (as does its current user base and question pool.)

      (b) While I don’t know that other projects plan to do, I know that our plans include lots of much more interesting analytics and connections between questions in the future, to really build out a Bayes network or similar structure, and potentially fold in other data sources, machine learning techniques, etc.

Comments are closed.