improver.calibration.samos_calibration module#

This module defines all the “plugins” specific to Standardised Anomaly Model Output Statistics (SAMOS).

class ApplySAMOS(percentiles=None, unique_site_id_key=None)[source]#

Bases: PostProcessingPlugin

Class to calibrate an input forecast using SAMOS given the following inputs: - Two GAMs which model, respectively, the climatological mean and standard deviation of the forecast. This allows the forecast to be converted to climatological anomalies. - A set of EMOS coefficients which can be applied to correct the climatological anomalies.

__init__(percentiles=None, unique_site_id_key=None)[source]#

Initialize class.

Parameters:
  • percentiles (Optional[Sequence]) – The set of percentiles used to create the calibrated forecast.

  • unique_site_id_key (Optional[str]) – If working with spot data and available, the name of the coordinate in the input cubes that contains unique site IDs, e.g. “wmo_id” if all sites have a valid wmo_id.

_abc_impl = <_abc._abc_data object>#
process(forecast, forecast_gams, truth_gams, gam_features, emos_coefficients, gam_additional_fields=None, emos_additional_fields=None, prob_template=None, realizations_count=None, ignore_ecc_bounds=True, tolerate_time_mismatch=False, predictor='mean', randomise=False, random_seed=None)[source]#
Calibrate input forecast using GAMs to convert the forecast to climatological

anomalies and pre-calculated EMOS coefficients to apply to those anomalies.

Parameters:
  • forecast (Cube) – Uncalibrated forecast as probabilities, percentiles or realizations.

  • forecast_gams (List) – A list containing two fitted GAMs, the first for predicting the climatological mean of the historic forecasts at each location and the second predicting the climatological standard deviation.

  • truth_gams (List) – A list containing two fitted GAMs, the first for predicting the climatological mean of the truths at each location and the second predicting the climatological standard deviation.

  • gam_features (List[str]) – The list of features. These must be either coordinates on input_cube or share a name with a cube in additional_cubes. The index of each feature should match the indices used in model_specification.

  • emos_coefficients (CubeList) – EMOS coefficients.

  • gam_additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors in the GAMs.

  • emos_additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors in EMOS.

  • prob_template (Optional[Cube]) – A cube containing a probability forecast that will be used as a template when generating probability output when the input format of the forecast cube is not probabilities i.e. realizations or percentiles.

  • realizations_count (Optional[int]) – Number of realizations to use when generating the intermediate calibrated forecast from probability or percentile inputs

  • ignore_ecc_bounds (bool) – If True, allow percentiles from probabilities to exceed the ECC bounds range. If input is not probabilities, this is ignored.

  • tolerate_time_mismatch (bool) – If True, tolerate a mismatch in validity time and forecast period for coefficients vs forecasts. Use with caution!

  • predictor (str) – Predictor to be used to calculate the location parameter of the calibrated distribution. Value is “mean” or “realizations”.

  • randomise (bool) – Used in generating calibrated realizations. If input forecast is probabilities or percentiles, this is ignored.

  • random_seed (Optional[int]) – Used in generating calibrated realizations. If input forecast is probabilities or percentiles, this is ignored.

Returns:

Calibrated forecast in the form of the input (ie probabilities percentiles or realizations).

transform_anomalies_to_original_units(location_parameter, scale_parameter, truth_mean, truth_sd, forecast, input_forecast_type)[source]#

Function to transform location and scale parameters which describe a climatological anomaly distribution to location and scale parameters which describe a distribution in the units of the original forecast. Both parameter cubes are modified in place.

Predictions of mean and standard deviation from the ‘truth’ GAMs are used for this transformation. This ensures that the calibrated forecast follows the ‘true’ distribution, rather than the distribution of the original forecast, following the suggested method in:

Dabernig, M., Mayr, G.J., Messner, J.W. and Zeileis, A. (2017). Spatial ensemble post-processing with standardized anomalies. Q.J.R. Meteorol. Soc, 143: 909-916. https://doi.org/10.1002/qj.2975

Parameters:
  • location_parameter (Cube) – Cube containing the location parameter of the climatological anomaly distribution. This is modified in place.

  • scale_parameter (Cube) – Cube containing the scale parameter of the climatological anomaly distribution. This is modified in place.

  • truth_mean (Cube) – Cube containing climatological mean predictions of the truths.

  • truth_sd (Cube) – Cube containing climatological standard deviation predictions of the truths.

  • forecast (Cube) – The original, uncalibrated forecast.

  • input_forecast_type (str) – The type of the original, uncalibrated forecast. One of ‘realizations’, ‘percentiles’ or ‘probabilities’.

Return type:

None

class TrainEMOSForSAMOS(distribution, emos_kwargs=None, unique_site_id_key=None)[source]#

Bases: BasePlugin

Class to calculate Ensemble Model Output Statistics (EMOS) coefficients to calibrate climate anomaly forecasts given training data including forecasts and verifying observations and four Generalized Additive Models (GAMs) which model: - forecast mean, - forecast standard deviation, - observation mean, - observation standard deviation.

This class first calculates climatological means and standard deviations by predicting them from the input GAMs. Following this, the input forecasts and observations are converted to climatological anomalies using the predicted means and standard deviations. Finally, EMOS coefficients are calculated from the climatological anomaly training data.

__init__(distribution, emos_kwargs=None, unique_site_id_key=None)[source]#

Initialize the class.

Parameters:
  • distribution (str) – Name of distribution. Assume that a calibrated version of the climate anomaly forecast could be represented using this distribution.

  • emos_kwargs (Optional[Dict]) – Keyword arguments accepted by the EstimateCoefficientsForEnsembleCalibration plugin. Should not contain a distribution argument.

  • unique_site_id_key (Optional[str]) – If working with spot data and available, the name of the coordinate in the input cubes that contains unique site IDs, e.g. “wmo_id” if all sites have a valid wmo_id.

_abc_impl = <_abc._abc_data object>#
climate_anomaly_emos(forecast_cubes, truth_cubes, additional_fields=None, landsea_mask=None)[source]#

Function to convert forecasts and truths to climate anomalies then calculate EMOS coefficients for the climate anomalies.

Parameters:
  • forecast_cubes (List[Cube]) – A list of three cubes: a cube containing historic forecasts, a cube containing climatological mean predictions of the forecasts and a cube containing climatological standard deviation predictions of the forecasts.

  • truth_cubes (List[Cube]) – A list of three cubes: a cube containing historic truths, a cube containing climatological mean predictions of the truths and a cube containing climatological standard deviation predictions of the truths.

  • additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors.

  • landsea_mask (Optional[Cube]) – The optional cube containing a land-sea mask. If provided, only land points are used to calculate the coefficients. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

process(historic_forecasts, truths, forecast_gams, truth_gams, gam_features, gam_additional_fields=None, emos_additional_fields=None, landsea_mask=None)[source]#

Function to convert historic forecasts and truths to climatological anomalies, then fit EMOS coefficients to these anomalies.

Parameters:
  • historic_forecasts (Cube) – Historic forecasts from the training dataset.

  • truths (Cube) – Truths from the training dataset.

  • forecast_gams (List) – A list containing two fitted GAMs, the first for predicting the climatological mean of the locations in historic_forecasts and the second predicting the climatological standard deviation. Appropriate GAMs are produced by the TrainGAMsForSAMOS plugin.

  • truth_gams (List) – A list containing two fitted GAMs, the first for predicting the climatological mean of the locations in truths and the second predicting the climatological standard deviation. Appropriate GAMs are produced by the TrainGAMsForSAMOS plugin.

  • gam_features (List[str]) – The list of features. These must be either coordinates on input_cube or share a name with a cube in gam_additional_fields. The index of each feature must match the indices used in model_specification.

  • gam_additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors in the GAMs.

  • emos_additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors in EMOS.

  • landsea_mask (Optional[Cube]) – The optional cube containing a land-sea mask. If provided, only land points are used to calculate the EMOS coefficients. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

class TrainGAMsForSAMOS(model_specification, max_iter=100, tol=0.0001, distribution='normal', link='identity', fit_intercept=True, window_length=11, valid_rolling_window_fraction=0.5, unique_site_id_key=None)[source]#

Bases: BasePlugin

Class for fitting Generalised Additive Models (GAMs) to training data for use in a Standardised Anomaly Model Output Statistics (SAMOS) calibration scheme.

Two GAMs are trained: one modelling the mean of the training data and one modelling the standard deviation. These can then be used to convert forecasts or observations to climatological anomalies. This plugin should be run separately for forecast and observation data.

__init__(model_specification, max_iter=100, tol=0.0001, distribution='normal', link='identity', fit_intercept=True, window_length=11, valid_rolling_window_fraction=0.5, unique_site_id_key=None)[source]#

Initialize the class.

Parameters:
  • model_specification (list[list[str], list[int], dict]) – A list of lists which each contain three items (in order): 1. a string containing a single pyGAM term; one of ‘linear’, ‘spline’, ‘tensor’, or ‘factor’. 2. a list of integers which correspond to the features to be included in that term. 3. a dictionary of kwargs to be included when defining the term.

  • max_iter (int) – A pyGAM argument which determines the maximum iterations allowed when fitting the GAM.

  • tol (float) – A pyGAM argument determining the tolerance used to define the stopping criteria.

  • distribution (str) – A pyGAM argument determining the distribution to be used in the model.

  • link (str) – A pyGAM argument determining the link function to be used in the model.

  • fit_intercept (bool) – A pyGAM argument determining whether to include an intercept term in the model.

  • window_length (int) – This must be an odd integer greater than 1. The length of the rolling window used to calculate the mean and standard deviation of the input cube when the input cube does not have a realization dimension coordinate.

  • valid_rolling_window_fraction (float) – This must be a float between 0 and 1, inclusive. When performing rolling window calculations, if a given window has less than this fraction of valid data points (not NaN) then the value returned will be NaN and will be excluded from training.

  • unique_site_id_key (Optional[str]) – An optional key to use for uniquely identifying each site in the training data. If not provided, the default behavior is to use the spatial coordinates (latitude, longitude) of each site.

_abc_impl = <_abc._abc_data object>#
apply_aggregator(padded_cube, aggregator)[source]#

Internal function to apply rolling window aggregator to padded cube.

Parameters:
  • padded_cube (Cube) – The cube to have rolling window calculation applied to.

  • aggregator (WeightedAggregator) – The aggregator to use in the rolling window calculation.

Return type:

Cube

Returns:

A cube containing the result of the rolling window calculation. Any cell methods and time bounds are removed from the cube as they are not necessary for later calculations.

calculate_cube_statistics(input_cube)[source]#

Function to calculate mean and standard deviation of the input cube. If the cube has a realization dimension then statistics will be calculated by collapsing over this dimension. Otherwise, a rolling window calculation over the time dimension will be used.

The rolling window method calculates a statistic over data in a fixed time window and assigns the value of the statistic to the central time in the window. For example, for data points [0.0, 1.0, 2.0, 1.0, 0.0] each valid in consecutive hours T+0, T+1, T+2, T+3, T+4, the mean calculated by a rolling window of width 5 would be 0.8. This value would be associated with T+2 in the resulting cube.

To enable this calculation to produce a cube of the same dimensions as input_cube, the data in input_cube is first padded with additional data. For a rolling window of width 5, 2 data slices are added to the start and end of the input_cube time coordinate. The data in these slices are masked so that they don’t affect the calculated statistics.

Parameters:

input_cube (Cube) – A cube with at least one of the following coordinates: 1. A realization dimension coordinate 2. A time coordinate with more than one point and evenly spaced points.

Return type:

CubeList

Returns:

CubeList containing a mean cube and standard deviation cube.

Raises:
  • ValueError – If input_cube does not contain a realization coordinate and

  • does contain a time coordinate with unevenly spaced points.

calculate_statistic_by_rolling_window(input_cube)[source]#

Function to calculate mean and standard deviation of input_cube using a rolling window calculation over the time coordinate.

The input_cube time coordinate is padded at the beginning and end of the time coordinate, to ensure that the result of the rolling window calculation has the same shape as input_cube. Additionally, any missing time points in the input cube are filled with masked data, so that the rolling window is always taken over a period containing an equal number of time points.

process(input_cube, features, additional_fields=None)[source]#

Function to fit GAMs to model the mean and standard deviation of the input_cube for use in SAMOS.

Parameters:
  • input_cube (Cube) – Historic forecasts or observations from the training dataset. Must contain at least one of: - a realization coordinate - a time coordinate with more than one point and equally spaced points

  • features (List[str]) – The list of features. These must be either coordinates on input_cube or share a name with a cube in additional_fields. The index of each feature should match the indices used in model_specification.

  • additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors.

Return type:

List

Returns:

A list containing fitted GAMs which model the input_cube mean and standard deviation.

Raises:
  • ValueError – If input_cube does not contain at least one of a realization or

  • time coordinate.

  • ValueError – If the input cube does not have a realization coordinate and the

  • time coordinate that it does have contains only one point.

convert_dataframe_to_cube(df, template_cube)[source]#

Function to convert a Pandas dataframe to Iris cube format by using a template cube. The input template_cube provides all metadata for the output.

Parameters:
  • df (DataFrame) – A Pandas dataframe which must contain at least the following columns: 1. A column matching the name of template_cube 2. A series of columns with names which match the dimension coordinates on template_cube. The data in these columns should match the points on the corresponding dimension of template_cube.

  • template_cube (Cube) – A cube which will provide all metadata for the output cube

Return type:

Cube

Returns:

A copy of template_cube containing data from df.

get_climatological_stats(input_cube, gams, gam_features, additional_cubes, sd_clip=0.25, unique_site_id_key=None)[source]#

Function to predict climatological means and standard deviations given fitted GAMs for each statistic and cubes which can be used to construct a dataframe containing all required features for those GAMs.

Parameters:
  • input_cube (Cube)

  • gams (List) – A list containing two fitted GAMs, the first for predicting the climatological mean of the locations in input_cube and the second predicting the climatological standard deviation.

  • gam_features (List[str]) – The list of features. These must be either coordinates on input_cube or share a name with a cube in additional_cubes. The index of each feature should match the indices used in model_specification.

  • additional_cubes (Optional[CubeList]) – Additional fields to use as supplementary predictors.

  • sd_clip (float) – The minimum standard deviation value to allow when predicting from the GAM. Any predictions below this value will be set to this value.

  • unique_site_id_key (Optional[str]) – If working with spot data and available, the name of the coordinate in the input cubes that contains unique site IDs, e.g. “wmo_id” if all sites have a valid wmo_id.

Return type:

Tuple[Cube, Cube]

Returns:

A pair of cubes containing climatological mean and climatological standard deviation predictions respectively.

prepare_data_for_gam(input_cube, additional_fields=None, unique_site_id_key=None)[source]#

Convert input cubes in to a single, combined dataframe.

Each of the input cubes is converted to a pandas dataframe. The dataframe derived from input_cube then forms the left in a series of left dataframe joins with those derived from each cube in additional_fields. The x and y coordinates are used to perform this join. This means that the resulting combined dataframe will contain all of the sites/grid points in input_cube, but not any other sites/grid points in the additional_fields cubes.

Parameters:
  • input_cube (Cube) – A cube of forecast or observation data.

  • additional_fields (Optional[CubeList]) – Additional cubes with points which can be matched with points in input_cube by matching spatial coordinate values.

  • unique_site_id_key (Optional[str]) – If working with spot data and available, the name of the coordinate in the input cubes that contains unique site IDs, e.g. “wmo_id” if all sites have a valid wmo_id.

Return type:

DataFrame

Returns:

A pandas dataframe with rows equal to the number of sites/grid points in input_cube and containing the following columns: 1. A column with the same name as input_cube containing the original cube data 2. A series of columns derived from the input_cube dimension coordinates 3. A series of columns associated with any auxiliary coordinates (scalar or otherwise) of input_cube 4. One column associated with each of the cubes in additional cubes, with column names matching the associated cube

class pygam[source]#

Bases: object

GAM()[source]#