improver.calibration.utilities module#
This module defines all the utilities used by the “plugins” specific for ensemble calibration.
- broadcast_data_to_time_coord(cubelist)[source]#
Ensure that the data from all cubes within a cubelist is of the required shape by broadcasting the data from cubes without a time coordinate along the time dimension taken from other input cubes that do have a time coordinate. In the case where none of the input cubes have a time coordinate that is a dimension coordinate, which may occur when using a very small training dataset, the data is returned without being broadcast.
- Parameters:
cubelist (
CubeList) – The cubelist from which the data will be extracted and broadcast along the time dimension as required.- Return type:
- Returns:
The data taken from cubes within a cubelist where cubes without a time coordinate have had their data broadcast along the time dimension (with this time dimension provided by other input cubes with a time dimension) to ensure that the data within each numpy array within the output list has the same shape. If a time dimension coordinate is not present on any of the cubes, no broadcasting occurs.
- check_data_sufficiency(historic_forecasts, truths, point_by_point, proportion_of_nans)[source]#
Check whether there is sufficient valid data (i.e. values that are not NaN) within the historic forecasts and truths, in order to robustly compute EMOS coefficients.
- Parameters:
historic_forecasts (
Cube) – Cube containing historic forcasts.truths (
Cube) – Cube containing truths.point_by_point (
bool) – If True, coefficients are calculated independently for each point within the input cube by creating an initial guess and minimising each grid point independently.proportion_of_nans (
float) – The proportion of the matching historic forecast-truth pairs that are allowed to be NaN.
- Raises:
ValueError – If the proportion of NaNs is higher than allowable for a site, if using point_by_point.
ValueError – If the proportion of NaNs is higher than allowable when considering all sites.
- check_forecast_consistency(forecasts)[source]#
Checks that the forecast cubes have a consistent forecast reference time hour and a consistent forecast period.
- Parameters:
forecasts (
Cube)- Raises:
ValueError – Forecast cubes have differing forecast reference time hours
ValueError – Forecast cubes have differing forecast periods
- Return type:
- check_predictor(predictor)[source]#
Check the predictor at the start of the process methods in relevant ensemble calibration plugins, to avoid having to check and raise an error later. Also, lowercase the string.
- Parameters:
predictor (
str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.- Return type:
- Returns:
The predictor string in lowercase.
- Raises:
ValueError – If the predictor is not valid.
- convert_cube_data_to_2d(forecast, coord='realization', transpose=True)[source]#
Function to convert data from a N-dimensional cube into a 2d numpy array. The result can be transposed, if required.
- Parameters:
forecast (
Cube) – N-dimensional cube to be reshaped.coord (
str) – This dimension is retained as the second dimension by default, and the leading dimension if “transpose” is set to False.transpose (
bool) – If True, the resulting flattened data is transposed. This will transpose a 2d array of the format [coord, :] to [:, coord]. If coord is not a dimension on the input cube, the resulting array will be 2d with items of length 1.
- Return type:
- Returns:
Reshaped 2d array.
- convert_parquet_to_cube(forecast, truth, forecast_period, cycletime, training_length, diagnostic, percentiles, experiment)[source]#
Function to convert a parquet file containing forecast and truth data into a CubeList for use in calibration.
- Parameters:
forecast (pathlib.Path) – The path to a Parquet file containing the historical forecasts to be used for calibration. The expected columns within the Parquet file are: forecast, blend_time, forecast_period, forecast_reference_time, time, wmo_id, percentile, diagnostic, latitude, longitude, period, height, cf_name, units.
truth (pathlib.Path) – The path to a Parquet file containing the truths to be used for calibration. The expected columns within the Parquet file are: ob_value, time, wmo_id, diagnostic, latitude, longitude and altitude.
forecast_period (int) – Forecast period to be calibrated in seconds.
cycletime (str) – Cycletime of a format similar to 20170109T0000Z.
training_length (int) – Number of days within the training period.
diagnostic (str) – The name of the diagnostic to be calibrated within the forecast and truth tables. This name is used to filter the Parquet file when reading from disk.
percentiles (List[float]) – The set of percentiles to be used for estimating coefficients. These should be a set of equally spaced quantiles.
experiment (str) – A value within the experiment column to select from the forecast table.
- Return type:
- Returns:
A CubeList containing the forecast and truth cubes, with the forecast cube containing the percentiles as an auxiliary coordinate.
- create_unified_frt_coord(forecast_reference_time)[source]#
Constructs a single forecast reference time coordinate from a multi-valued coordinate. The new coordinate records the maximum range of bounds of the input forecast reference times, with the point value set to the latest of those in the inputs.
- Parameters:
forecast_reference_time (
DimCoord) – The forecast_reference_time coordinate to be used in the coordinate creation.- Return type:
- Returns:
A dimension coordinate containing the forecast reference time coordinate with suitable bounds. The coordinate point is that of the latest contributing forecast.
- filter_non_matching_cubes(historic_forecast, truth)[source]#
Provide filtering for the historic forecast and truth to make sure that these contain matching validity times. This ensures that any mismatch between the historic forecasts and truth is dealt with. If multiple time slices of the historic forecast match with the same truth slice, only the first truth slice is kept to avoid duplicate truth slices, which prevent the truth cubes being merged. This can occur when processing a cube with a multi-dimensional time coordinate. If a historic forecast time slice contains only NaNs, then this time slice is also skipped. This can occur when processing a multi-dimensional time coordinate where some of the forecast reference time and forecast period combinations do not typically occur, so may be filled with NaNs.
- Parameters:
- Return type:
- Returns:
Cube of historic forecasts where any mismatches with the truth cube have been removed.
Cube of truths where any mismatches with the historic_forecasts cube have been removed.
- Raises:
ValueError – The filtering has found no matches in validity time between the historic forecasts and the truths.
- flatten_ignoring_masked_data(data_array, preserve_leading_dimension=False)[source]#
Flatten an array, selecting only valid data if the array is masked. There is also the option to reshape the resulting array so it has the same leading dimension as the input array, but the other dimensions of the array are flattened. It is assumed that each of the slices along the leading dimension are masked in the same way. This functionality is used in EstimateCoefficientsForEnsembleCalibration when realizations are used as predictors.
- Parameters:
data_array (
Union[MaskedArray,ndarray]) – An array or masked array to be flattened. If it is masked and the leading dimension is preserved the mask must be the same for every slice along the leading dimension.preserve_leading_dimension (
bool) – Default False. If True the flattened array is reshaped so it has the same leading dimension as the input array. If False the returned array is 1D.
- Return type:
- Returns:
A flattened array containing only valid data. Either 1D or, if preserving the leading dimension 2D. In the latter case the leading dimension is the same as the input data_array.
- Raises:
ValueError – If preserving the leading dimension and the mask on the input array is not the same for every slice along the leading dimension.
- forecast_coords_match(first_cube, second_cube)[source]#
Determine if two cubes have equivalent forecast_periods and forecast_reference_time coordinates with an accepted leniency. The forecast period is rounded up to the next hour to support calibrating subhourly forecasts with coefficients taken from on the hour. For forecast reference time, only the hour is checked.
- Parameters:
- Raises:
ValueError – The two cubes are not equivalent.
- Return type:
- get_frt_hours(forecast_reference_time)[source]#
Returns a set of integer representations of the hour of the forecast reference time.
- merge_land_and_sea(calibrated_land_only, uncalibrated)[source]#
Merge data that has been calibrated over the land with uncalibrated data. Calibrated data will have masked data over the sea which will need to be filled with the uncalibrated data.
- Parameters:
calibrated_land_only (
Cube) – A cube that has been calibrated over the land, with sea points masked out. Either realizations, probabilities or percentiles. Data is modified in place.uncalibrated (
Cube) – A cube of uncalibrated data with valid data over the sea. Either realizations, probabilities or percentiles. Dimension coordinates must be the same as the calibrated_land_only cube.
- Raises:
ValueError – If input cubes do not have the same input dimensions.
- Return type:
- prepare_cube_no_calibration(forecast, emos_coefficients, ignore_ecc_bounds_exceedance=False, validity_times=None, percentiles=None, prob_template=None)[source]#
Function to add appropriate metadata to cubes that cannot be calibrated. If the forecast can be calibrated then nothing is returned.
- Parameters:
forecast (iris.cube.Cube) – The forecast to be calibrated. The input format could be either realizations, probabilities or percentiles.
emos_coefficients (iris.cube.CubeList) – The EMOS coefficients to be applied to the forecast.
ignore_ecc_bounds_exceedance (bool) – If True, where the percentiles exceed the ECC bounds range, raises a warning rather than an exception. This occurs when the current forecast is in the form of probabilities and is converted to percentiles, as part of converting the input probabilities into realizations.
validity_times (List[str]) – Times at which the forecast must be valid. This must be provided as a four digit string (HHMM) where the first two digits represent the hour and the last two digits represent the minutes e.g. 0300 or 0315. If the forecast provided is at a different validity time then no coefficients will be applied.
percentiles (List[float]) – The set of percentiles used to create the calibrated forecast.
prob_template (iris.cube.Cube) – Optionally, a cube containing a probability forecast that will be used as a template when generating probability output when the input format of the forecast cube is not probabilities i.e. realizations or percentiles. If no coefficients are provided and a probability template is provided, the probability template forecast will be returned as the uncalibrated probability forecast.
- Return type:
- Returns:
The prepared forecast cube or None.