Using CRPS to Evaluate Recast Models

What are probabilistic forecasts?

We care about two things when we’re making forecasts: how “correct” is the forecast (how far away were the actuals from the predicted values) and how good are we at understanding our own uncertainty (we want to penalize a forecast less if it was said to be uncertain upfront).

For example, let’s imagine that last quarter you did a 6-month forecast of revenue using your media mix model (MMM). Instead of giving you a single number for the forecasted revenue (like $20,543,758) your MMM ideally will also provide uncertainty about the forecast in the form of a distribution of forecasted revenues and how likely they are. The taller the distribution at a revenue value, the more likely that revenue number is, according to your forecast. Here, $20,543,758 is the most likely value.

Normally, with a single forecasted revenue value, we could just check how close it is to your actual revenue. The model said $20,543,758 and you made $20,143,713, so that seems pretty accurate. But, if your forecast doesn’t give you a single number, how do you know if the forecast is correct?

What is a good probabilistic forecast?

First, let’s conceptualize what would make a good probabilistic forecast in the first place.

Correctness: on average the predicted values are close to the actual value
Precision: there’s high certainty in the predictions

We need both of these things to have a good probabilistic forecast. If the forecast is correct but not precise then our forecast range will be too wide for us to make any actionable information. Imagine how hard it would be to plan for your next quarter if your forecasted revenue is between $10 and $50,000,000!

If a forecast is precise but not correct, then we’ll confidently take action on an incorrect prediction. Imagine your model tells you that your revenue will be between $10,000,000 and $10,050,000 and it’s actually $8,000,000.

CRPS

A Continuous Ranked Probability Score (CRPS) measures how accurate probabilistic forecasts are. Rather than comparing an observed value with a point-prediction (a single number), in CRPS, the observed score is compared with a probabilistic prediction (a distribution of possible predictions). CRPS can be thought of as an extension of Mean Absolute Error (MAE) where predictions are given as a distribution of possible values, rather than a single value. Thus, just like with MAE, lower CRPS values indicate better model performance.

Comparing CRPS vs R-Squared, MAPE, and MAE

Imagine you’re playing darts, but you don’t know exactly where the dart will land — there’s some randomness in your throw.

With R-squared, MAPE, and MAE:

You make a single guess, “I think it’ll land right here,” and then throw the dart. Then you measure how far away the dart is from the target and that’s your “error”.

R-squared, MAPE, and MAE scores you only on how close that one shot was from the target. If you guessed the wrong spot but the dart landed in an area you thought was still possible, you get no extra credit.

With CRPS:

Instead of one guess, you draw a whole “probability map” on the dartboard before throwing — shading areas darker where you think the dart is more likely to land, and lighter where it’s less likely.

• If the dart lands in a dark, heavily shaded area, you score well.

• If it lands in a lightly shaded or blank area, you’re penalized.

In other words, R-squared, MAPE, and MAE care only about the single predicted spot, while CRPS considers the whole range of possible outcomes and their likelihoods.

CRPS Deep Dive

Before we dig into the technical definition of CRPS, we can look at it visually. The plot below shows two cumulative density functions (CDFs). While that might sound complicated, a CDF is easy to read. On the x-axis is a value in our distribution (like $20M, 10$, or $32.5M) and on the y-axis is the proportion of the distribution that is at or below that value. We want most of our predictive distribution to fall near the correct value. So, visually, we can judge the performance of our predictions based on how much shaded area there is between the CDF of the predictions and the solid curve showing the correct value.

CRPS not only takes into account how far away a model’s average guess is from the observed value, but also how uncertain the model was about the prediction. It is the squared area between the cumulative density function of our predictive distribution and an indicator function at the actual value. We square the values because under and over predicting are treated as equally bad.

Where $F(y)$ is the cumulative density function of our prediction distribution, and $\mathbf{1}\{y \geq x\}$ is an indicator (step) function that goes from a value of $0$ when $y \lt x$ and is $1$ elsewhere. The integral $\int_{-\infty}^{\infty} … \mathrm{d}y$ calculates the squared area between these two curves.

Note: when our prediction is a point estimate rather than a distribution, this plot reduces down to the area between two indicator functions and is equal to the MAE.

There are two ways to make this area (and therefore the CRPS) smaller/better:

Improve Correctness: make the average prediction closer to the observed value
Improve Precision: make the prediction distribution narrower (more certain)

The plot below shows three different prediction distributions each with the same average prediction, but with different levels of uncertainty around the prediction (red: low uncertainty, yellow: moderate uncertainty, green: high uncertainty).

We can see visually that more uncertainty leads to a larger area and therefore a higher CRPS score.

Conclusion

Probabilistic forecasts are powerful because they allow you to use probability to quantify the uncertainty we have about forecasts. This empowers you to make informed decisions, both in terms of taking action, but also in terms of whether your forecast is actionable at all. But, when you’re no longer getting a prediction that’s a single value, it can be difficult to know whether your forecast was correct. CRPS gives us a basic understanding of both the Correctness and Precision of your predicted forecast. While many people are familiar with why correctness is important, one thing CRPS is great at is drawing more attention to precision. We not only want to be correct, but also precise so that we can confidently make decisions based on our analyses!

Using CRPS to Evaluate Recast Models

What are probabilistic forecasts?

What is a good probabilistic forecast?

CRPS

Comparing CRPS vs R-Squared, MAPE, and MAE

CRPS Deep Dive

Conclusion

About The Author

Chelsea Parlett

What are probabilistic forecasts?

What is a good probabilistic forecast?

CRPS

Comparing CRPS vs R-Squared, MAPE, and MAE

CRPS Deep Dive

Conclusion

Related posts:

About The Author

Chelsea Parlett