National Weather Service United States Department of Commerce

Statistical Verification Score Definitions


Mean Absolute Error (MAE)

The Mean Absolute Error (MAE) is a measure of forecast accuracy. A small value indicates a better score, a perfect score is zero. MAE is defined as:

Σ ( |forecast - observation| ) / N
(from i = 1 to N)
 

where N = the total number of observations

On our page, gridded verification scores include MAE scores. Point verification scores include MAE scores for the following weather elements: Max Temp, Min Temp, Sfc Temp, Dew Point, Wind Speed, Wind Direction, Wind Gust, and Relative Humidity.


Bias

The mean algebraic error (bias) indicates whether a forecast is too high or too low in predicting a certain parameter. For example, a positively biased temperature forecast indicates that forecasts were, on average, too warm. Similarly, a negatively biased temperature forecast indicates that forecasts were, on average, too cool. Using another example, a positively biased wind speed forecast indicates that forecasts were, on average, predicting wind speeds that were too high. A bias of zero is possible if a forecaster's over-forecasting and under-forecasting cancel each other or if the forecast is perfect. Bias should be looked at in conjunction with MAE to determine forecasting error. The bias is defined as:

Σ (forecast - observation) / N
(from i = 1 to N)

where N = the total number of observations

On our page, gridded verification scores include bias scores. Point verification scores include bias scores for the following weather elements: Max Temp, Min Temp, Sfc Temp, Dew Point, Wind Speed, Wind Gust, and Relative Humidity.


Brier Score

The Brier Score is the mean square error applied to probability forecasts. A common form of this score, called the half-Brier score, used on these verification web pages, is defined as follows:

Σ (forecast - observation) 2 / N
(from i = 1 to N)

where N = the total number of observations

  • The probability forecasts used in these computations have 1% precision.
  • The observation is set equal to one if precipitation greater than or equal to 0.01 inches occurred, or to zero if no precipitation (or a trace) occurred.
  • The score (half-Brier score) has a range of 0 to 1, with lower scores indicating better forecasts. Example: If the PoP forecast is 100% in 10 cases, and it rains in all 10 cases, then the Brier score is 0. Similarly, if the PoP forecast is 50% in 10 cases, and it rains in only 5 of them, then the Brier score is 0.25.
  • Generally, the rarer the event, the better the Brier score, regardless of the forecast skill. Therefore, care must be used when comparing the Brier scores for different locations or seasons.
  • On our page, this score is used only for point verification for the weather element PoP.

Heidke Skill Score (HSS)

The Heidke Skill Score (HSS) is a measure of skill in forecasts. It is defined as follows:

(NC - E) / (T - E)

where NC equals the number of correct forecasts (in other words, the number of times the forecast and the observations match), T equals the total number of forecasts, and E equals the number of forecasts expected to verify based on chance.

This can be calculated using a contingency table:

Heidke Skill Score Table

Observed Category Forecast Category
  1 2 ... m Total
1 X11 X12 ... X1m X1p
2 X21 X22 ... X2m X2p
... ... ... ... ... ...
m Xm1 Xm2 ... Xmm Xmp
Total Xp1 Xp2 ... Xpm Xpp

where m is the number of categories, the element Xij indicates the number of times the forecast was in the jth category and the observation was in the ith category. The row and column totals are shown by the subscript (and category) p.

NC = Σ Xii
(from i = 1 to m)
 
T = Xpp
 
E = Σ (XipXpi) / T
(from i = 1 to m)
 

A negative HSS indicates that a forecast is worse than a randomly based/generated forecast. On our page, we indicate a perfect score in two ways. First, for a sample whose forecasts and observations fall into more than one category (i.e., the matched forecast and observation totals occupy more than one cell of the contingency table), the computed HSS=1.0. Second, for a sample whose forecast/observation total occupies only one cell of the contingency table (there was no reason to forecast anything but the most commonly observed condition), we set the HSS equal to 9997. This difference helps to highlight the stations that achieved a perfect score under more difficult forecast conditions.

On our page, Heidke Skill Scores are used for point verification results for the following weather elements: Wind Speed and Wind Dir.


Fraction Correct

The Fraction Correct is a measure of forecast accuracy. It is defined as follows:

NC / T

where NC equals the number of correct forecasts (in other words, the number of times the forecast and the observations match), and T equals the total number of forecasts.
This can be calculated using a contingency table:

Contingency Table

Observed Category Forecast Category
  1 2 ... m Total
1 X11 X12 ... X1m X1p
2 X21 X22 ... X2m X2p
... ... ... ... ... ...
m Xm1 Xm2 ... Xmm Xmp
Total Xp1 Xp2 ... Xpm Xpp

where m is the number of categories, the element Xij indicates the number of times the forecast was in the jth category and the observation was in the ith category. The row and column totals are shown by the subscript (and category) p.

NC = Σ Xii
(from i = 1 to m)
 
T = Xpp
 

On our page, Fraction Correct is used for point verification results for the following weather elements: Sky Cover.


Relative Frequency

Relative Frequency (RF) is a measure of the number of occurrences a forecast falls within a certain bounded error. It is defined as follows:

RFi = Σ ni / N
(from i = 1 to N)
 

where (Σ ni) equals the count of direction errors in a certain category and N equals the total number of forecasts/observations.

For example, if there are 10 forecasts with 8 of the forecasts having an error less than or equal to 5 degrees, the relative frequency of forecasts with error of 5 degrees or less is 0.8.

On our page, Relative Frequency is used for point verification results for the Wind Dir weather element. A plot of relative frequency is provided for wind direction errors less than 30 degrees for wind speeds of 8 knots or greater only.