improver.calibration package#

Submodules#

Module contents#

init for calibration that contains functionality to split forecast, truth and coefficient inputs.

class CalibrationSchemas[source]#

Bases: object

__init__()[source]#

Define the pyarrow schemas for forecast and truth parquet files.

add_feature_from_df_to_df(forecast_df, feature_df, feature_name, possible_merge_columns, float_decimals=4)[source]#

Add a feature to the forecast DataFrame from a second DataFrame based on the feature configuration. Columns within possible_merge_columns that are float are rounded to a specified number of decimal places before merging to avoid precision issues.

Parameters:
  • forecast_df (DataFrame) – DataFrame containing the forecast data.

  • feature_df (DataFrame) – DataFrame containing the feature data.

  • feature_name (str) – Name of the feature to be added.

  • possible_merge_columns (list[str]) – List of column names that can be used to merge the feature DataFrame to the forecast DataFrame.

  • float_decimals (int) – Number of decimal places to round float columns to before merging. Default is 4, which corresponds to rounding to 0.0001.

Returns:

DataFrame with additional feature added.

add_static_feature_from_cube_to_df(forecast_df, feature_cube, feature_name, possible_merge_columns, float_decimals=4)[source]#

Add a static feature to the forecast DataFrame from a cube based on the feature configuration. Other features are expected to already be present in the forecast DataFrame. Columns within possible_merge_columns that are float after converting from a Cube to a DataFrame, are rounded to a specified number of decimal places before merging to avoid precision issues.

Parameters:
  • forecast_df (DataFrame) – DataFrame containing the forecast data.

  • cube_inputs – List of cubes containing additional features.

  • feature_name (str) – Name of the feature to be added.

  • possible_merge_columns (list[str]) – List of column names that can be used to merge the feature DataFrame to the forecast DataFrame.

  • float_decimals (int) – Number of decimal places to round float columns to before merging. Default is 4, which corresponds to rounding to 0.0001.

Return type:

DataFrame

Returns:

DataFrame with additional feature added from the input cubes.

add_warning_comment(forecast)[source]#

Add a comment to warn that calibration has not been applied.

Parameters:

forecast (Cube) – The forecast to which a comment will be added.

Return type:

Cube

Returns:

Forecast with an additional comment.

get_common_wmo_ids(forecast_cube, truth_cube, additional_predictors=None)[source]#

Extracts the common WMO IDs from the forecast, truth and any additional predictor cubes.

Parameters:
  • forecast_cube (Cube) – Cube containing the forecast data.

  • truth_cube (Cube) – Cube containing the truth data.

  • additional_predictors (Optional[CubeList]) – CubeList containing any additional predictors.

Raises:

IOError – If no common WMO IDs are found in the input cubes.

Return type:

Tuple[Cube, Cube, CubeList]

Returns:

The forecast, truth and additional predictor cubes with only the common WMO IDs retained.

get_training_period_cycles(cycletime, forecast_period, training_length)[source]#

Generate a list of forecast reference times for the training period.

Parameters:
  • cycletime (str) – The time at which the forecast is issued in a format understood by pandas.Timestamp e.g. 20170109T0000Z.

  • forecast_period (Union[int, str]) – The forecast period in seconds.

  • training_length (int) – The number of days in the training period.

identify_parquet_type(parquet_paths)[source]#

Determine whether the provided parquet paths contain forecast or truth data. This is done by checking the columns within the parquet files for the presence of a forecast_period column which is only present for forecast data.

Parameters:

parquet_paths (List[Path]) – A list of paths to Parquet files.

Returns:

  • The path to the Parquet file containing the historical forecasts.

  • The path to the Parquet file containing the truths.

split_cubes_for_samos(cubes, gam_features, truth_attribute=None, expect_emos_coeffs=False, expect_emos_fields=False)[source]#

Function to split the forecast, truth, gam additional predictors and emos additional predictor cubes.

Parameters:
  • cubes (CubeList) – A list of input cubes which will be split into relevant groups.

  • gam_features (List[str]) – A list of strings containing the names of the additional fields required for the SAMOS GAMs.

  • truth_attribute (Optional[str]) – An attribute and its value in the format of “attribute=value”, which must be present on truth cubes. If None, no truth cubes are expected or returned.

  • expect_emos_coeffs (bool) – If True, EMOS coefficient cubes are expected to be found in the input cubes. If False, an error will be raised if any such cubes are found.

  • expect_emos_fields (bool) – If True, additional EMOS fields are expected to be found in the input cubes. If False, an error will be raised if any such cubes are found.

Raises:
  • IOError – If no forecast cube is found and/or no truth cube is found when a truth_attribute has been provided.

  • IOError – If EMOS coefficients cubes are found when they are not expected.

  • IOError – If additional fields cubes are found which do not match the features in gam_features.

  • IOError – If probability cubes are provided with more than one name.

Returns:

  • A cube containing all the historic forecasts, or None if no such cubes were found.

  • A cube containing all the truth data, or None if no such cubes were found or no truth_attribute was provided.

  • A cubelist containing all the additional fields required for the GAMs, or None if no such cubes were found.

  • A cubelist containing all the EMOS coefficient cubes, or None if no such cubes were found.

  • A cubelist containing all the additional fields required for EMOS, or None if no such cubes were found.

  • A cube containing a probability template, or None if no such cube is found.

split_forecasts_and_bias_files(cubes)[source]#

Split the input forecast from the forecast error files used for bias-correction.

Parameters:

cubes (CubeList) – A list of input cubes which will be split into forecast and forecast errors.

Return type:

Tuple[Cube, Optional[CubeList]]

Returns:

  • A cube containing the current forecast.

  • If found, a cube or cubelist containing the bias correction files.

Raises:
  • ValueError – If multiple forecast cubes provided, when only one is expected.

  • ValueError – If no forecast is found.

split_forecasts_and_coeffs(cubes, land_sea_mask_name=None)[source]#

Split the input forecast, coefficients, static additional predictors, land sea-mask and probability template, if provided. The coefficients cubes and land-sea mask are identified based on their name. The static additional predictors are identified as not have a time coordinate. The current forecast and probability template are then split.

Parameters:
  • cubes (Union[List[CubeList[Cube]], List[List[Cube]]]) – A list either containing a CubeList or containing a list of input cubes which will be split into relevant groups. This includes the forecast, coefficients, static additional predictors, land-sea mask and probability template.

  • land_sea_mask_name (Optional[str]) – Name of the land-sea mask cube to help identification.

Returns:

  • A cube containing the current forecast.

  • If found, a cubelist containing the coefficients else None.

  • If found, a cubelist containing the static additional predictor else None.

  • If found, a land-sea mask will be returned, else None.

  • If found, a probability template will be returned, else None.

Raises:
  • ValueError – If multiple items provided, when only one is expected.

  • ValueError – If no forecast is found.

split_forecasts_and_truth(cubes, truth_attribute)[source]#

A common utility for splitting the various inputs cubes required for calibration CLIs. These are generally the forecast cubes, historic truths, and in some instances a land-sea mask is also required.

Parameters:
  • cubes (List[Cube]) – A list of input cubes which will be split into relevant groups. These include the historical forecasts, in the format supported by the calibration CLIs, and the truth cubes.

  • truth_attribute (str) – An attribute and its value in the format of “attribute=value”, which must be present on truth cubes.

Return type:

Tuple[Cube, Cube, Optional[Cube]]

Returns:

  • A cube containing all the historic forecasts.

  • A cube containing all the truth data.

  • If found within the input cubes list a land-sea mask will be returned, else None is returned.

Raises:
  • ValueError – An unexpected number of distinct cube names were passed in.

  • IOError – More than one cube was identified as a land-sea mask.

  • IOError – Missing truth or historical forecast in input cubes.

split_netcdf_parquet_pickle(files)[source]#

Split the input files into netcdf, parquet, and pickle files. Only a single pickle file is expected.

Parameters:

files – A list of input file paths which will be split into pickle, parquet, and netcdf files.

Returns:

  • A flattened cube list containing all the cubes contained within the provided paths to NetCDF files.

  • A list of paths to Parquet files.

  • A loaded pickle file.

Raises:

ValueError – If multiple pickle files provided, as only one is ever expected.

validity_time_check(forecast, validity_times)[source]#

Check the validity time of the forecast matches the accepted validity times within the validity times list.

Parameters:
  • forecast (Cube) – Cube containing the forecast to be calibrated.

  • validity_times (List[str]) – Times at which the forecast must be valid. This must be provided as a four digit string (HHMM) where the first two digits represent the hour and the last two digits represent the minutes e.g. 0300 or 0315. If the forecast provided is at a different validity time then no coefficients will be applied.

Return type:

bool

Returns:

If the validity time within the cube matches a validity time within the validity time list, then True is returned. Otherwise, False is returned.