Probability Distribution#
Context#
As IMPROVER is inherently probabilistic, it seems appropriate to have a section specifically focused on the representation of probability distributions in the metadata. This includes some extensions to the CF Metadata Conventions which provide limited support in this area.
Probability distributions can be represented in one of three different ways:
Ensemble members - a set of realizations (or scenarios) each holding a possible value of the diagnostic of interest;
Probabilities of the value of the diagnostic being above, below or between a set of thresholds;
Percentile values representing thresholds of the distribution of the diagnostic below which the value will occur with fixed relative frequency.
These all have different strengths and weaknesses in different situations, which will not be discussed here, but they also have different metadata, which will be described here.
Ensemble members#
This is the simplest form of representing forecast uncertainty, as it is a natural extension to deterministic forecasts, incorporating a number of realizations (or versions of the forecast). As such, its representation can be accommodated through the inclusion of an additional coordinate variable.
For ensemble member data, the following must be present:
Dimension
realizationCoordinate variable
realization, with:Units
1Standard name
realization
An example would be the Met Office MOGREPS-UK model, which runs every hour to generate a 3-member ensemble:
dimensions:
realization = 3 ;
variables:
float air_temperature(realization) ;
air_temperature:standard_name = "air_temperature" ;
air_temperature:units = "K" ;
int realization(realization) ;
realization:units = "1" ;
realization:standard_name = "realization" ;
data:
realization = 0, 1, 2 ;
Percentiles#
This is probably the second most straightforward form, as again it still represents actual sets of values of the diagnostic. Instead of realizations (separate scenarios, each self-consistent over time), the set of percentiles represent the values of a set of thresholds below which the value of the diagnostic will occur with fixed relative frequency. This can again be incorporated by adding a coordinate variable.
For percentile data, the following must be present:
Dimension
percentileCoordinate variable
percentile, with:Units
%Long name
percentile
An example would be a set of percentile values for temperature:
dimensions:
percentile = 13 ;
variables:
float air_temperature(percentile) ;
air_temperature:standard_name = "air_temperature" ;
air_temperature:units = "K" ;
float percentile(percentile) ;
percentile:units = "%" ;
percentile:long_name = "percentile" ;
data:
percentile = 5, 10, 15, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95 ;
Probabilities#
This is a more interesting form, as the diagnostics values are transformed into a set of probabilities of being above, below, or between a set of thresholds, so the metadata is more substantially changed. This can be catered for with a new coordinate variable to represent the set of thresholds.
For probability data, the following must be present:
Dimension
thresholdCoordinate variable
threshold, with:Units appropriate to the original diagnostic (indicated by
Vin the following text)Standard_name or long_name (as appropriate) set to that of the original diagnostic (
V)
Main variable, with:
Units
1Long name set to one of the following (as appropriate):
probability_of_V_above_thresholdprobability_of_V_below_threshold
where
Vis the standard or long name of the original variable
- A new non-CF attribute
spp__relative_to_threshold which is used to indicate the nature of the threshold inequality, and takes one of the four values:
greater_thangreater_than_or_equal_toless_thanless_than_or_equal_to
- A new non-CF attribute
An example would be a set of probabilities of temperature exceeding a set of 79 thresholds:
dimensions:
threshold = 79 ;
variables:
float probability_of_air_temperature_above_threshold(threshold) ;
probability_of_air_temperature_above_threshold:long_name = "probability_of_air_temperature_above_threshold" ;
probability_of_air_temperature_above_threshold:units = "1" ;
float threshold(threshold) ;
threshold:units = "K" ;
threshold:standard_name = "air_temperature" ;
threshold:spp__relative_to_threshold = "greater_than_or_equal_to" ;
data:
threshold = 213.15, 218.15, 223.15, 228.15, 233.15, 238.15, 243.15, ....