improver.categorical.modal_code module#

Module containing a plugin to calculate the modal category in a period.

class BaseModalCategory(decision_tree)[source]#

Bases: BasePlugin

Base plugin for modal weather symbol plugins.

__init__(decision_tree)[source]#

Set up base plugin.

Parameters:

decision_tree (Dict) – The decision tree used to generate the categories and which contains the mapping of day and night categories and of category groupings.

_abc_impl = <_abc._abc_data object>#
_prepare_input_cubes(cubes, record_run_attr=None, model_id_attr=None)[source]#

Prepare the input cubes by adding supplementary coordinates as required and merging the input cubes.

Parameters:
  • cubes (CubeList) – Input cubes on which to store the metadata required.

  • record_run_attr (Optional[str]) – Attribute to record the run information. Defaults to None.

  • model_id_attr (Optional[str]) – Attribute to record the model_id information. Defaults to None.

Return type:

Cube

Returns:

Merged cube with metadata added as required.

_prepare_result_cube(cube, cubes, result, record_run_attr=None, model_id_attr=None)[source]#

Update the result cube with metadata from the input cubes.

Parameters:
  • cube (Cube) – Input cube

  • cubes (CubeList) – Input cubelist.

  • result (Cube) – Result cube.

  • record_run_attr (Optional[str]) – Attribute to record the run information. Defaults to None.

  • model_id_attr (Optional[str]) – Attribute to record the model_id information. Defaults to None.

Raises:

ValueError – If the time coordinate on the input cube does not represent consistent periods.

Return type:

Cube

Returns:

Cube with updated metadata.

_unify_day_and_night(cube)[source]#

Remove distinction between day and night codes so they can each contribute when calculating the modal code. The cube of categorical data is modified in place with all night codes made into their daytime equivalents.

Parameters:

data (A cube of categorical)

class ModalCategory(decision_tree, model_id_attr=None, record_run_attr=None)[source]#

Bases: BaseModalCategory

Plugin that returns the modal category over the period spanned by the input data. In cases of a tie in the mode values, scipy returns the smaller value. The opposite is desirable in this case as the significance / importance of the weather code categories generally increases with the value. To achieve this the categories are subtracted from an arbitrarily larger number prior to calculating the mode, and this operation is reversed before the final output is returned.

If there are many different categories for a single point over the time spanned by the input cubes it may be that the returned mode is not robust. Given the preference to return more significant categories explained above, a 12 hour period with 12 different categories, one of which is severe, will return that severe category to describe the whole period. This is likely not a good representation. In these cases grouping is used to try and select a suitable category (e.g. a rain shower if the codes include a mix of rain showers and dynamic rain) by providing a more robust mode. The lowest number (least significant) member of the group is returned as the code. Use of the least significant member reflects the lower certainty in the forecasts.

Where there are different categories available for night and day, the modal code returned is always a day code, regardless of the times covered by the input files.

__init__(decision_tree, model_id_attr=None, record_run_attr=None)[source]#

Set up plugin and create an aggregator instance for reuse

Parameters:
  • decision_tree (Dict) – The decision tree used to generate the categories and which contains the mapping of day and night categories and of category groupings.

  • model_id_attr (Optional[str]) – Name of attribute recording source models that should be inherited by the output cube. The source models are expected as a space-separated string.

  • record_run_attr (Optional[str]) – Name of attribute used to record models and cycles used in constructing the categories.

_abc_impl = <_abc._abc_data object>#
_code_groups()[source]#

Determines code groupings from the decision tree

Return type:

Dict

_group_codes(modal, cube)[source]#

In instances where the mode returned is not significant, i.e. the category chosen occurs infrequently in the period, the codes can be grouped to yield a more definitive period code. Given the uncertainty, the least significant category (lowest number in a group that is found in the data) is used to replace the other data values that belong to that group prior to recalculating the modal code.

The modal cube is modified in place.

Parameters:
  • modal (Cube) – The modal categorical cube which contains UNSET_CODE_INDICATOR values that need to be replaced with a more definitive period code.

  • cube (Cube) – The original input data. Data relating to unset points will be grouped and the mode recalculated.

static _set_blended_times(cube)[source]#

Updates time coordinates so that time point is at the end of the time bounds, blend_time and forecast_reference_time (if present) are set to the end of the bound period and bounds are removed, and forecast_period is updated to match.

Return type:

None

mode_aggregator(data, axis)[source]#

An aggregator for use with iris to calculate the mode along the specified axis. If the modal value selected comprises less than 30% of data along the dimension being collapsed, the value is set to the UNSET_CODE_INDICATOR to indicate that the uncertainty was too high to return a mode.

Parameters:
  • data (ndarray) – The data for which a mode is to be calculated.

  • axis (int) – The axis / dimension over which to calculate the mode.

Return type:

ndarray

Returns:

The data array collapsed over axis, containing the calculated modes.

process(cubes)[source]#

Calculate the modal categorical code, with handling for edge cases.

Parameters:

cubes (CubeList) – A list of categorical cubes at different times. A modal code will be calculated over the time coordinate to return the most common code, which is taken to be the best representation of the whole period.

Return type:

Cube

Returns:

A single categorical cube with time bounds that span those of the input categorical cubes.

class ModalFromGroupings(decision_tree, broad_categories, wet_categories, intensity_categories=None, day_weighting=1, day_start=6, day_end=18, wet_bias=1, model_id_attr=None, record_run_attr=None)[source]#

Bases: BaseModalCategory

Plugin that creates a modal weather code over a period using a grouping approach. Firstly, a wet and dry grouping is computed. Secondly, for the wet grouping, groupings can be provided, such as, “extreme”, “frozen” and “liquid”, so that wet weather codes can be grouped further. These groupings can be controlled as follows. Firstly, a day weighting functionality is provided so that daytime hours can be weighted more heavily. A wet bias can also be provided, so that wet codes are given a larger weight as they are considered more impactful. A second categorisation is then available for the wet codes. This is useful when e.g. a period is represented using a variety of frozen precipitation weather codes, so that a frozen precipitation weather code can be diagnosed as an appropriate summary. The ignore intensity option allows light and heavy weather types to be considered together when ascertaining the most common weather type. The final daily symbol will be the most common of the light and heavy input codes of the chosen type.

The ordering of the codes within the category dictionaries guides which category is selected in the event of the tie with preference given to the lowest index. Incrementing the codes within the category dictionaries from most significant code to least significant code helps to ensure that the most significant code is returned in the event of a tie, if desired.

Where there are different categories available for night and day, the modal code returned is always a day code, regardless of the times covered by the input files.

If a location is to return a dry code after consideration of the various weightings, the wet codes for that location are converted into the best matching dry cloud code and these are included in determining the resulting dry code. The wet bias has no impact on the weight of these converted wet codes, but the day weighting still applies.

DAY_LENGTH = 24#
__init__(decision_tree, broad_categories, wet_categories, intensity_categories=None, day_weighting=1, day_start=6, day_end=18, wet_bias=1, model_id_attr=None, record_run_attr=None)[source]#

Set up plugin.

Parameters:
  • decision_tree (Dict) – The decision tree used to generate the categories and which contains the mapping of day and night categories and of category groupings.

  • broad_categories (Dict[str, int]) – Dictionary defining the broad categories for grouping the weather symbol codes. This is expected to have the keys: “dry” and “wet”.

  • wet_categories (Dict[str, int]) – Dictionary defining groupings for the wet categories. No specific names for the keys are required. Key and values within the dictionary should both be ordered in terms of descending priority.

  • intensity_categories (Optional[Dict[str, int]]) – Dictionary defining intensity groupings. Values should be ordered in terms of descending priority. The most common weather code from the options available representing different intensities will be used as the representative weather code.

  • day_weighting (int) – Weighting to provide day time weather codes. A weighting of 1 indicates the default weighting. A weighting of 2 indicates that the weather codes during the day time period will be duplicated, so that they count twice as much when computing a representative weather code.

  • day_start (int) – Hour defining the start of the daytime period.

  • day_end (int) – Hour defining the end of the daytime period.

  • wet_bias (int) – Bias to provide wet weather codes. A bias of 1 indicates the default, where half of the codes need to be a wet code, in order to generate a wet code. A bias of 3 indicates that only a quarter of codes are required to be wet, in order to generate a wet symbol. To generate a wet symbol, the fraction of wet symbols therefore need to be greater than or equal to 1 / (1 + wet_bias).

  • model_id_attr (Optional[str]) – Name of attribute recording source models that should be inherited by the output cube. The source models are expected as a space-separated string.

  • record_run_attr (Optional[str]) – Name of attribute used to record models and cycles used in constructing the categories.

_abc_impl = <_abc._abc_data object>#
_consolidate_intensity_categories(cube)[source]#

Consolidate weather codes representing different intensities of precipitation. This can help with computing a representative weather code.

Parameters:

cube (Cube) – Weather codes cube.

Return type:

Cube

Returns:

Weather codes cube with intensity categories consolidated, if intensity categories are provided.

_emphasise_day_period(cube)[source]#

A day weighting can be set which biases the forecasts towards the hours of e.g. 6am-6pm. This is achieved by counting the number of input times available e.g. hourly and taking those that are 18 times from the end up to those that are 6 from the end and duplicating these symbols by the integer weighting. This approach is taken to accommodate different timezones without the need for any timezone awareness. Inputs are always provided from midnight to midnight, or ending at midnight if a partial day is provided. The middle of the set of input times therefore corresponds to the local middle of the day. The count back from the end of the period is done to accommodate partial periods (same day updates). The index counted backwards is clipped to 0, meaning if there are only 12 files being passed in (because we’re around midday when we perform the update), the first index will be 0, rather than -6, and only symbols from 6 periods will be multiplied up by the day_weighting.

Metadata is not used to select the day period as the times recorded within the cubes are all UTC, rather than local time, so the local day period can not be identified. The time and forecast_period coordinates are incremented by the the minimum arbitrary amount (1 second) to ensure non-duplicate coordinates.

Parameters:

cube (Cube) – Weather codes cube.

Return type:

Cube

Returns:

Cube with more times during the daytime period, so that daytime hours are emphasised, depending upon the day_weighting chosen.

_find_intensity_indices(cube)[source]#

Find which points / sites include any weather code predictions that fall within the intensity categories.

Parameters:

cube (Cube) – Weather code cube.

Return type:

ndarray

Returns:

Boolean that is True if any weather code from the intensity categories are found at a given point, otherwise False.

_find_most_significant_dry_code(cube, result, dry_indices)[source]#

Find the most significant dry weather code at each point.

Parameters:
  • cube (Cube) – Weather code cube.

  • result (Cube) – Cube into which to put the result.

  • dry_indices (ndarray) – Boolean, which is true if the weather codes at that point, are dry.

Return type:

Cube

Returns:

Cube where points that are dry are filled with the most common dry code present at that point. If there is a tie, the most significant dry weather code is used, assuming higher values for the weather code indicates more significant weather.

_find_wet_indices(cube, time_axis)[source]#

Identify the points at which a wet weather code should be selected. This can include a wet bias if supplied.

Parameters:
  • cube (Cube) – Weather codes cube.

  • time_axis (int) – The time coordinate dimension.

Return type:

ndarray

Returns:

Boolean array that is true if the weather codes are wet or False otherwise.

_get_dry_equivalents(cube, dry_indices, time_axis)[source]#

Returns a cube with only dry codes in which all wet codes have been replaced by their nearest dry cloud equivalent. For example a shower code is replaced with a partly cloudy code, a light rain code is replaced with a cloud code, and a heavy rain code is replaced with an overcast cloud code.

Parameters:
  • cube (Cube) – Weather code cube.

  • dry_indices (ndarray) – An array of bools which are true for locations where the summary weather code will be dry.

  • time_axis (int) – The time coordinate dimension.

Returns:

Wet codes converted to their dry equivalent for those points

that will receive a dry summary weather code.

Return type:

cube

_get_most_likely_following_grouping(cube, result, categories, required_indices, time_axis, categorise_using_modal)[source]#

Determine the most common category and subcategory using a dictionary defining the categorisation. The category could be a group of weather codes representing frozen precipitation, where the subcategory would be the individual weather codes, so that this method is able to identify the most likely weather code within the most likely weather code category. If a category or subcategory is tied, then the first as defined within the categories dictionary is taken. As the categories and subcategories within the dictionary are expected to be in descending priority order, this will ensure that the highest priority item is chosen in the event of a tie.

Parameters:
  • cube (Cube) – Weather codes cube.

  • result (Cube) – Cube in which to put the result.

  • categories (Dict) – Dictionary defining the categories (keys) and subcategories (values). The most likely category and then the most likely value for the subcategory is put into the result cube.

  • required_indices (ndarray) – Boolean indicating which indices within the result cube to fill.

  • time_axis (int) – The time coordinate dimension.

  • categorise_using_modal (bool) – Boolean defining whether the top level categorisation should use the input cube or the processed result time. The input cube will have a time dimension, whereas the result cube will not have a time dimension.

Returns:

A result cube containing the most appropriate weather code following categorisation.

static _promote_time_coords(cube, template_cube)[source]#

Promote the time coordinate, so that cubes can be concatenated along the time coordinate. Concatenation, rather than merging, helps to ensure consistent output, as merging can lead to other coordinates e.g. forecast_reference_time and forecast_period being made the dimension coordinate.

Parameters:
  • cube (Cube) – Cube with time coordinates.

  • template_cube (Cube) – Cube to provide coordinates associated with the time coordinate that will be added to the output cube.

Return type:

Cube

Returns:

A cube with a time dimension coordinate and other time-related coordinates are associated with the time dimension coordinate.

static _set_blended_times(cube, result)[source]#

Updates time coordinates so that time point is at the end of the time bounds, blend_time and forecast_reference_time (if present) are set to the end of the bound period and bounds are removed, and forecast_period is updated to match. The result cube is modified in-place.

Parameters:
  • cube (Cube) – Cube containing metadata on the temporal coordinates that will be used to add the relevant metadata to the result cube.

  • result (Cube) – Cube containing the computed modal weather code. This cube will be updated in-place.

Return type:

None

static counts_per_category(data, bin_max)[source]#

Implemented following https://stackoverflow.com/questions/46256279/bin-elements-per-row-vectorized-2d-bincount-for-numpy/46256361#46256361 # noqa: E501 Use np.bincount to count the number of occurrences within each category, so that the most common occurrence can then be found.

Parameters:
  • data (ndarray) – Array where occurrences of each possible integer value between 0 and data.max() will be counted.

  • bin_max (int) – Integer defining the number of categories expected.

Return type:

ndarray

Returns:

An array of counts for the occurrence of each category within each row.

process(cubes)[source]#

Calculate the modal categorical code by grouping weather codes.

Parameters:

cubes (CubeList) – A list of categorical cubes at different times. A modal code will be calculated over the time coordinate to return the most common code, which is taken to be the best representation of the whole period.

Return type:

Cube

Returns:

A single categorical cube with time bounds that span those of the input categorical cubes.