clisops.core package

Core functionality for clisops.

Submodules

clisops.core.average module

Average module.

clisops.core.average.average_over_dims(ds, dims=None, ignore_undetected_dims=False)[source]

Average a DataArray or Dataset over the dimensions specified.

Parameters:

ds (xr.DataArray or xr.Dataset) – Input values.
dims (Sequence[{“time”, “level”, “latitude”, “longitude”}]) – The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions must be one of [“time”, “level”, “latitude”, “longitude”].
ignore_undetected_dims (bool) – If the dimensions specified are not found in the dataset, an Exception will be raised if set to True. If False, an exception will not be raised and the other dimensions will be averaged over. Default = False.

Return type:

DataArray | Dataset

Returns:

xr.DataArray or xr.Dataset – New Dataset or DataArray object averaged over the indicated dimensions. The indicated dimensions will have been removed.

Examples

from clisops.core.average import average_over_dims

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over latitude and longitude
prAvg = average_over_dims(pr, dims=["latitude", "longitude"], ignore_undetected_dims=True)

clisops.core.average.average_shape(ds, shape, variable=None)[source]

Average a DataArray or Dataset spatially using vector shapes.

Return a DataArray or Dataset averaged over each Polygon given. Requires xESMF.

Parameters:

ds (xarray.Dataset) – Input values, coordinate attributes must be CF-compliant.
shape (Union[str, Path, gpd.GeoDataFrame]) – Path to shape file, or directly a GeoDataFrame. Supports formats compatible with geopandas. Will be converted to EPSG:4326 if needed.
variable (Union[str, Sequence[str], None]) – Variables to average. If None, average over all data variables.

Return type:

DataArray | Dataset

Returns:

Union[xarray.DataArray, xarray.Dataset] – ds spatially-averaged over the polygon(s) in shape. Has a new geom dimension corresponding to the index of the input GeoDataFrame. Non-geometry columns of the GeoDataFrame are copied as auxiliary coordinates.

Notes

The spatial weights are computed with ESMF, which uses corners given in lat/lon format (EPSG:4326), the input dataset ds must provide those. In opposition to subset.subset_shape, the weights computed here take partial overlaps and holes into account.

As xESMF computes the weight masks only once, skipping missing values is not really feasible. Thus, all NaNs propagate when performing the average.

Examples

import xarray as xr
from clisops.core.average import average_shape

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over shape
prAvg = average_shape(pr, shape=path_to_shape_file)

# Average multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file])
dsAvg = average_shape(ds, shape=path_to_shape_file)

clisops.core.average.average_time(ds, freq)[source]

Average a DataArray or Dataset over the time frequency specified.

Parameters:

ds (Union[xr.DataArray, xr.Dataset]) – Input values.
freq (str) – The frequency to average over. One of “month”, “year”.

Return type:

DataArray | Dataset

Returns:

Union[xr.DataArray, xr.Dataset] – New Dataset or DataArray object averaged over the indicated time frequency.

Examples

from clisops.core.average import average_time

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over each month
prAvg = average_time(pr, freq="month")

clisops.core.regrid module

Regrid module.

class clisops.core.regrid.Grid(ds=None, grid_id=None, grid_instructor=None, compute_bounds=False, mask=None)[source]

Bases: object

Create a Grid object that is suitable to serve as source or target grid of the Weights class.

Pre-processes coordinate variables of input dataset (e.g. create or read dataset from input, reformat, generate bounds, identify duplicated and collapsing cells, determine extent).

Parameters:

ds (xr.Dataset or xr.DataArray, optional) – Uses horizontal coordinates of an xarray.Dataset or xarray.DataArray to create a Grid object. The default is None.
grid_id (str, optional) – Create the Grid object from a selection of pre-defined grids, e.g. “1deg” or “2pt5deg”. The grids are provided via the roocs_grids package (https://github.com/roocs/roocs-grids). A special setting is “adaptive”/”auto”, which requires the parameter ‘ds’ to be specified as well, and creates a regular lat-lon grid of the same extent and approximate resolution as the grid described by ‘ds’. The default is None.
grid_instructor (tuple, float or int, optional) – Create a regional or global regular lat-lon grid using xESMF utility functions. For global grid: grid_instructor = (lon_step, lat_step) or grid_instructor = step; For regional grid: grid_instructor = (lon_start, lon_end, lon_step, lat_start, lat_end, lat_step) or grid_instructor = (start, end, step). The default is None.
compute_bounds (bool, optional) – Compute latitude and longitude bounds if the dataset has none defined. The default is False.
mask (str, optional) – Whether to mask “ocean” cells or “land” cells if a mask variable is found. The default is None.

_apply_lsm(mask=None)[source]

Detect mask helper function.

Parameters:

self (Grid) – The Grid object to which the mask will be applied.
mask (str, optional) – Whether to mask “ocean” cells or “land” cells. The default is None.

Return type:

bool

Returns:

bool – Whether self.lsm was assigned to a 2D-mask in form of a xr.DataArray or not.

Raises:

UserWarning – If mask is specified but not found in dataset.

_cap_precision(decimals)[source]

Round horizontal coordinate variables to specified precision using numpy.around.

Parameters:: decimals (int) – The decimal position / precision to round to.
Return type:: None
Returns:: None

_compute_bounds()[source]

Compute bounds for regular (rectangular or curvilinear) grids.

The bounds will be attached as coords to the xarray.Dataset of the Grid object. If no bounds can be created, a warning is issued.

_compute_hash()[source]

Compute md5 checksum for each component of the horizontal grid, including a potentially defined mask.

Stores the individual checksum of each component (lat, lon, lat_bnds, lon_bnds, mask) in a dictionary and returns an overall checksum.

Return type:: str
Returns:: str – md5 checksum of the checksums of all 5 grid components.

static _create_collapse_mask(ds, lat_bnds, lon_bnds)[source]

Create a boolean mask indicating which grid cells collapse to lines or points.

Parameters:

ds (xarray.Dataset) – A dataset containing latitude and longitude bounds.
lat_bnds (str) – The name of the latitude bounds variable in the dataset.
lon_bnds (str) – The name of the longitude bounds variable in the dataset.

Returns:

xarray.DataArray – An integer mask indicating which grid cells collapse to lines or points (1: ok, 0: collapsed).

static _create_duplicate_mask(arr)[source]: Create duplicate mask helper function.

static _create_smashed_mask(ds, lat_bnds, lon_bnds)[source]

Create a boolean mask indicating which cells are smashed (i.e. cells with nearly identical opposite vertices).

Parameters:

ds (xarray.Dataset) – A dataset containing latitude and longitude bounds.
lat_bnds (str) – The name of the latitude bounds variable in the dataset.
lon_bnds (str) – The name of the longitude bounds variable in the dataset.

Returns:

xarray.DataArray – An integer mask indicating which cells are smashed (1: ok, 0: smashed).

_detect_mask()[source]: Detect mask helper function.

Warning

Not yet implemented, if at all necessary (e.g. for reformatting to SCRIP etc.).

_drop_vars(keep_attrs=False)[source]

Remove all non-necessary (non-horizontal) coords and data_vars of the Grids’ xarray.Dataset.

Parameters:: keep_attrs (bool) – Whether to keep the global attributes. The default is False.
Return type:: None

_get_title()[source]

Generate a title for the Grid with more information than the basic string representation.

Return type:: str

_grid_detect_collapsed_cells()[source]: Detect collapsing grid cells. Requires defined bounds.

_grid_detect_duplicated_cells()[source]

Detect a possible grid halo / duplicated cells.

Return type:: bool

_grid_detect_smashed_cells()[source]

Detect smashed grid cells (i.e. cells with nearly identical vertices).

Requires defined bounds.

_grid_from_ds_adaptive(ds)[source]: Create Grid of similar extent and resolution of input dataset.

_grid_from_id(grid_id)[source]: Load pre-defined grid from netCDF file.

_grid_from_instructor(grid_instructor)[source]: Process instructions to create regional or global grid (uses xESMF utility functions).

_grid_unstagger()[source]

Interpolate to cell center from cell edges, rotate vector variables in lat/lon direction.

Warning

This method is not yet implemented.

Return type:: None

_set_data_vars_and_coords()[source]

(Re)set xarray.Dataset.coords appropriately.

After opening/creating an xarray.Dataset, likely coordinates can be found set as data_vars, and data_vars set as coords. This method (re)sets the coords. Dimensionless variables that are not registered in any “coordinates” attribute are per default reset to data_vars, so xarray does not keep them in the dataset after remapping; an example for this is “rotated_latitude_longitude”.

_transfer_coords(source_grid, keep_attrs=True)[source]

Transfer all non-horizontal coordinates and optionally global attributes between two Grid objects.

Parameters:

source_grid (Grid) – Source Grid object to transfer the coords from.
keep_attrs (bool or str, optional) – Whether to transfer also the global attributes. - False: do not transfer the global attributes. - “target”: preserve the global attributes of the target Grid object. - True: transfer the global attributes from source to target Grid object. The default is True.

Return type:

None

_verify_bounds()[source]

Use cf_xarray to obtain the variable name of the requested coordinates bounds.

Return type:: None
Returns:: str, optional – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.

compare_grid(ds_or_Grid, verbose=False)[source]

Compare two Grid objects.

Will compare the checksum of two Grid objects, which depend on the lat and lon coordinate variables, their bounds and if defined, a mask.

Parameters:

ds_or_Grid (xarray.Dataset or Grid) – Grid that the current Grid object shall be compared to.
verbose (bool) – Whether to also print the result. The default is False.

Return type:

bool

Returns:

bool – Returns True if the two Grids are considered identical within the defined precision, else returns False.

detect_bounds(coordinate)[source]

Use cf_xarray to get the variable name of the requested coordinates bounds.

Parameters:: coordinate (str) – Name of the coordinate variable to determine the bounds from.
Return type:: str | None
Returns:: str, optional – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.

detect_coordinate(coord_type)[source]

Use cf_xarray to obtain the variable name of the requested coordinate.

Parameters:: coord_type (str) – Coordinate type, e.g. ‘latitude’, ‘longitude’, ‘level’, ‘time’.
Return type:: str
Returns:: str – Coordinate variable name.
Raises:: KeyError – Raised if the requested coordinate cannot be identified.

detect_extent()[source]

Determine the grid extent in zonal / east-west direction (‘regional’ or ‘global’).

Return type:: str
Returns:: tuple of str – ‘regional’ or ‘global’ for the zonal and meridional extent, respectively.

detect_format()[source]

Detect format of a dataset.

Supported formats are ‘CF’, ‘SCRIP’, ‘xESMF’.

Return type:: str
Returns:: str – The format, if supported. Else raises an Exception.

detect_shape()[source]

Detect the shape of the grid.

Returns a tuple of (nlat, nlon, ncells). For an unstructured grid nlat and nlon are not defined and therefore the returned tuple will be (ncells, ncells, ncells).

Return type:

tuple[int, int, int]

Returns:

int – Number of latitude points in the grid.
int – Number of longitude points in the grid.
int – Number of cells in the grid.

detect_type()[source]

Detect type of the grid as one of “regular_lat_lon”, “curvilinear”, or “unstructured”.

Otherwise, will issue an Exception if grid type is not supported.

Return type:: str
Returns:: str – The detected grid type.

grid_reformat(grid_format, keep_attrs=False)[source]

Reformat the Dataset attached to the Grid object to a target format.

Parameters:

grid_format (str) – Target format of the reformat operation. Yet supported are ‘SCRIP’, ‘CF’, ‘xESMF’.
keep_attrs (bool) – Whether to keep the global attributes.

Return type:

Dataset

Returns:

xarray.Dataset – Reformatted dataset.

Raises:

Exception – If the reformat operation is not defined in clisops.utils.dataset_utils.

static is_smashed_quad2D(coords)[source]

Determine if a quadrilateral (quad) is smashed or degenerate.

Parameters:: coords (numpy.ndarray) – Array of shape (4, 2) representing the coordinates of the four quad corners. Each row corresponds to a corner, and each column corresponds to a coordinate (latitude, longitude).
Returns:: bool – True if the quad is smashed, otherwise False.

static points_equal(p1, p2, tol=1e-15)[source]

Check if two points are equal within a given tolerance.

Parameters:

p1 (tuple or list) – First point as a tuple or list of coordinates (x, y).
p2 (tuple or list) – Second point as a tuple or list of coordinates (x, y).
tol (float) – Tolerance for equality check, default is 1e-15.

Return type:

bool

Returns:

bool – True if the points are equal within the tolerance, otherwise False.

to_netcdf(folder='./', filename='', grid_format='CF', engine=None, keep_attrs=True)[source]

Store a copy of the horizontal Grid as netCDF file on disk.

Define output folder, file name and output format (currently only ‘CF’ is supported). Does not overwrite an existing file.

Parameters:

folder (str or Path, optional) – Output folder. The default is the current working directory “./”.
filename (str, optional) – Output filename, to be defined separately from folder. The default is ‘grid_<grid.id>.nc’.
grid_format (str, optional) – The format the grid information shall be stored as (in terms of variable attributes and dimensions). The default is “CF”, which is also the only supported output format currently supported.
engine (str, optional) – The engine to use for writing the netCDF file. If None, the default engine will be used.
keep_attrs (bool, optional) – Whether to store the global attributes in the output netCDF file. The default is True.

class clisops.core.regrid.Weights(grid_in, grid_out, method, from_disk=None, format=None)[source]

Bases: object

Creates remapping weights out of two Grid objects serving as source and target grid.

Reads weights from cache if possible or from disk if specified (not yet implemented). In the latter case, the weight file format has to be supported, to reformat it to xESMF format.

Parameters:

grid_in (Grid) – Grid object serving as source grid.
grid_out (Grid) – Grid object serving as target grid.
method (str) – Remapping method the weights should be / have been calculated with. One of [“nearest_s2d”, “bilinear”, “conservative”, “patch”] if weights have to be calculated. Free text if weights are read from disk.
from_disk (str, optional) – Not yet implemented. Instead of calculating the regridding weights (or reading them from the cache), read them from disk. The default is None.
format (str, optional) – Not yet implemented. When reading weights from disk, the input format may be specified. If omitted, there will be an attempt to detect the format. The default is None.

_compute()[source]: Generate the weights with xESMF or read them from cache.

_detect_format(ds)[source]

Detect format of remapping weights (read from disk).

Warning

This method is not yet implemented.

Return type:: None

_generate_id()[source]

Create a unique id for a Weights object.

The id consists of - hashes / checksums of source and target grid (namely lat, lon, lat_bnds, lon_bnds variables) - info about periodicity in longitude - info about collapsing cells - remapping method

Return type:: str
Returns:: str – The id as str.

_load_from_disk(filename=None, format=None)[source]

Read and process weights from disk.

Warning

This method is not yet implemented.

Return type:: None

_read_info_from_cache(key)[source]

Read info ‘key’ from cached metadata of current weight-file.

Returns the value for the given key, unless the key does not exist in the metadata or the file cannot be read. In this case, None is returned.

Parameters:: key (str)
Return type:: str | None
Returns:: str or None – Value for the given key, or None.

_save_to_cache(store_weights)[source]

Save Weights and source/target grids to cache (netCDF), including metadata (JSON).

Return type:: None

reformat(format_from, format_to)[source]

Reformat remapping weights.

Warning

This method is not yet implemented.

Return type:: None

save_to_disk(filename=None, wformat='xESMF')[source]

Write weights to disk in a certain format.

Warning

This method is not yet implemented.

Return type:: None

clisops.core.regrid.regrid(grid_in, grid_out, weights, adaptive_masking_threshold=0.5, keep_attrs=True)[source]

Perform regridding operation including dealing with dataset and variable attributes.

Parameters:

grid_in (Grid) – The Grid object of the source grid, e.g. created out of source xarray.Dataset.
grid_out (Grid) – The Grid object of the target grid.
weights (Weights) – The Weights object, as created by using grid_in and grid_out Grid objects as input.
adaptive_masking_threshold (float, optional) – (AMT) A value within the [0., 1.] interval that defines the maximum RATIO of missing_values amongst the total number of data values contributing to the calculation of the target grid cell value. For a fraction [0., AMT[ of the contributing source data missing, the target grid cell will be set to missing_value, else, it will be re-normalized by the factor 1./(1.-RATIO). Thus, if AMT is set to 1, all source grid cells that contribute to a target grid cell must be missing in order for the target grid cell to be defined as missing itself. Values greater than 1 or less than 0 will cause adaptive masking to be turned off. This adaptive masking technique allows to reuse generated weights for differently masked data (e.g. land-sea masks or orographic masks that vary with depth / height). The default is 0.5.
keep_attrs (bool or str) – Sets the global attributes of the resulting dataset, apart from the ones set by this routine: True: attributes of grid_in.ds will be in the resulting dataset. False: no attributes but the ones newly set by this routine “target”: attributes of grid_out.ds will be in the resulting dataset. The default is True.

Return type:

Dataset

Returns:

xarray.Dataset – The regridded data in form of an xarray.Dataset.

clisops.core.regrid.weights_cache_flush(weights_dir_init='', dryrun=False, verbose=False)[source]

Flush and reinitialize the local weights cache.

Parameters:

weights_dir_init (str, optional) – Directory name to reinitialize the local weights cache in. It will be created if it does not exist. The default is CONFIG[“clisops:grid_weights”][“local_weights_dir”] as defined in roocs.ini (or as redefined by a manual weights_cache_init call).
dryrun (bool, optional) – If True, it will only print all files that would get deleted. The default is False.
verbose (bool, optional) – If True, and dryrun is False, will print all files that are getting deleted. The default is False.

clisops.core.regrid.weights_cache_init(weights_dir=None, config={'clisops:coordinate_precision': {'hor_coord_decimals': '6', 'vert_coord_decimals': '6'}, 'clisops:grid_weights': {'local_weights_dir': '/home/docs/.local/share/clisops/grid_weights', 'remote_weights_svc': ''}, 'clisops:read': {'chunk_memory_limit': '2048MiB'}, 'clisops:write': {'file_size_limit': '2GB', 'output_staging_dir': ''}, 'config_data_types': {'boolean': 'use_catalog is_default_for_path', 'dicts': 'mappings attr_defaults fixed_path_mappings fixed_path_modifiers', 'extra_booleans': '', 'extra_dicts': '', 'extra_floats': '', 'extra_ints': '', 'extra_lists': '', 'floats': '', 'ints': '', 'lists': 'facet_rule'}, 'environment': {'mkl_num_threads': '1', 'numexpr_num_threads': '1', 'omp_num_threads': '1', 'openblas_num_threads': '1', 'veclib_maximum_threads': '1'}, 'project:c3s-cica-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS_v1/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SCMIP5/data/c3s-cmip5', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip5/', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False}, 'project:c3s-cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'use_catalog': True}, 'project:c3s-cmip6-decadal': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'use_catalog': True}, 'project:c3s-cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SCORDEX/data/c3s-cordex', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cordex/', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True}, 'project:c3s-ipcc-ar6-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-ipcc-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True}, 'project:cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/mnt/lustre/work/kd0956/CMIP5/data/cmip5', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'version', 'variable'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'http://esgf3.dkrz.de/thredds/fileServer/cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': False, 'mappings': {'project': 'mip_era', 'variable': 'variable_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:copernicus -interactive-climate-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS_v1/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/mnt/lustre/work/ik1017/CORDEX/data/cordex', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:ipcc-ar6-interactive-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}})[source]

Initialise global variable weights_dir as used by the Weights class.

Parameters:

weights_dir (str or Path) – Directory name to initialise the local weights cache in. It will be created if it does not exist. Per default, this function is called upon import with weights_dir as defined in roocs.ini.
config (dict) – Configuration dictionary as read from top-level.

clisops.core.subset module

Subset module.

clisops.core.subset.assign_bounds(bounds, coord)[source]

Replace unset boundaries by the minimum and maximum coordinates.

Parameters:

bounds (tuple[float or None, float or None]) – Boundaries.
coord (xarray.DataArray) – Grid coordinates.

Return type:

tuple[float | None, float | None]

Returns:

tuple[float or None, float or None] – Lower and upper grid boundaries.

clisops.core.subset.create_mask(*, x_dim, y_dim, poly, wrap_lons=False, check_overlap=False)[source]

Create a mask with values corresponding to the features in a GeoDataFrame using vectorise methods.

The returned mask’s points have the value of poly’s first geometry that they fall in.

Parameters:

x_dim (xarray.DataArray) – X or longitudinal dimension of the xarray object. Can also be given through ds_in.
y_dim (xarray.DataArray) – Y or latitudinal dimension of the xarray object. Can also be given through ds_in.
poly (gpd.GeoDataFrame) – A GeoDataFrame used to create the xarray.DataArray mask. If its index doesn’t have an integer dtype, it will be reset to integers, which will be used in the mask.
wrap_lons (bool) – Shift vector longitudes by -180,180 degrees to 0,360 degrees; Default = False.
check_overlap (bool) – Perform a check to verify if shapes contain overlapping geometries.

Return type:

DataArray

Returns:

xarray.DataArray – The mask array.

Examples

import geopandas as gpd
from clisops.core.subset import create_mask

ds = xr.open_dataset(path_to_tasmin_file)
polys = gpd.read_file(path_to_multi_shape_file)

# Get a mask from all polygons in the shape file
mask = create_mask(x_dim=ds.lon, y_dim=ds.lat, poly=polys)
ds = ds.assign_coords(regions=mask)

# Operations can be applied to each region with `groupby`. Ex:
ds = ds.groupby("regions").mean()

# Extra step to retrieve the names of those polygons stored in another column (here "id")
region_names = xr.DataArray(polys.id, dims=("regions",))
ds = ds.assign_coords(regions_names=region_names)

clisops.core.subset.create_weight_masks(ds_in, poly)[source]

Create weight masks corresponding to the features in a GeoDataFrame using xESMF.

The returned masks values are the fraction of the corresponding polygon’s area that is covered by the grid cell. Summing along the spatial dimension will give 1 for each geometry. Requires xESMF.

Parameters:

ds_in (xarray.DataArray or xarray.Dataset) – An xarray object containing the grid information, as understood by xESMF. For 2D lat/lon coordinates, the bounded arrays are required.
poly (gpd.GeoDataFrame) – GeoDataFrame used to create the xarray.DataArray mask. One mask will be created for each row in the dataframe. Will be converted to EPSG:4326 if needed.

Return type:

DataArray

Returns:

xarray.DataArray – Has a new geom dimension corresponding to the index of the input GeoDataframe. Non-geometry columns of poly are copied as auxiliary coordinates.

Examples

import geopandas as gpd
import xarray as xr

from clisops.core.subset import create_weight_masks

ds = xr.open_dataset(path_to_tasmin_file)
polys = gpd.read_file(path_to_multi_shape_file)

# Get a weight mask for each polygon in the shape file
mask = create_weight_masks(x_dim=ds.lon, y_dim=ds.lat, poly=polys)

clisops.core.subset.distance(da, *, lon, lat, mask=None)[source]

Return distance to a point in meters.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
lon (float, sequence of floats, or xarray.DataArray) – Longitude coordinate.
lat (float, sequence of floats, or xarray.DataArray) – Latitude coordinate.
mask (np.ndarray or xarray.DataArray, optional) – 2d boolean array with the same spatial dimensions as da, where True values indicate valid grid points to be considered for distance calculation. Optional.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray – Distance in meters to point.

Examples

import xarray as xr
from clisops.core.subset import distance

# To get the indices from the closest point, use:
da = xr.open_dataset(path_to_pr_file).pr
d = distance(da, lon=-75, lat=45)
k = d.argmin()
i, j, _ = np.unravel_index(k, d.shape)

clisops.core.subset.get_lat(ds)[source]

Get latitude coordinate from a Dataset or DataArray.

Parameters:: ds (xarray.Dataset or xarray.DataArray) – Input dataset or data array.
Return type:: DataArray
Returns:: xarray.DataArray – The latitude coordinate from the dataset or data array.

clisops.core.subset.get_lon(ds)[source]

Get longitude coordinate from a Dataset or DataArray.

Parameters:: ds (xarray.Dataset or xarray.DataArray) – Input dataset or data array.
Return type:: DataArray
Returns:: xarray.DataArray – The longitude coordinate from the dataset or data array.

clisops.core.subset.shape_bbox_indexer(ds, poly)[source]

Return a spatial indexer that selects the indices of the grid cells covering the given geometries.

Parameters:

ds (xr.Dataset) – Input dataset.
poly (gpd.GeoDataFrame, gpd.GeoSeries, pd.array.GeometryArray, or list of shapely geometries) – Shapes to cover. It can be of type Polygon, MultiPolygon, Point, or MultiPoint.

Returns:

dict – An xarray indexer along native dataset coordinates, to be used as an argument to isel.

Notes

This is used in particular to restrict the domain of a dataset before computing the weights for a spatial average.

Examples

>>> indexer = shape_bbox_indexer(ds, poly)
>>> ds.isel(indexer)

clisops.core.subset.subset_bbox(da, lon_bnds=None, lat_bnds=None, start_date=None, end_date=None, first_level=None, last_level=None, time_values=None, level_values=None)[source]

Subset a DataArray or Dataset spatially (and temporally) using a lat lon bounding box and date selection.

Return a subset of a DataArray or Dataset for grid points falling within a spatial bounding box defined by longitude and latitudinal bounds and for dates falling within provided bounds.

TODO: returns the what? In the case of a lat-lon rectilinear grid, this simply returns the

Parameters:

da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
lon_bnds (np.ndarray or tuple[float or None, float or None], optional) – List of minimum and maximum longitudinal bounds. Optional. Defaults to all longitudes in original data-array.
lat_bnds (np.ndarray or tuple[float or None, float or None], optional) – List of minimum and maximum latitudinal bounds. Optional. Defaults to all latitudes in original data-array.
start_date (str, optional) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (str, optional) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (int or float, optional) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (int or float, optional) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.
time_values (sequence of str, optional) – A list of datetime strings to subset.
level_values (sequence of int or float, optional) – A list of level values to select.

Return type:

DataArray | Dataset

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset.

Notes

subset_bbox expects the lower and upper bounds to be provided in ascending order. If the actual coordinate values are descending then this will be detected and your selection reversed before the data subset is returned.

Examples

import xarray as xr
from clisops.core.subset import subset_bbox

ds = xr.open_dataset(path_to_pr_file)

# Subset lat lon
prSub = subset_bbox(ds.pr, lon_bnds=[-75, -70], lat_bnds=[40, 45])

clisops.core.subset.subset_gridpoint(da, lon=None, lat=None, method='distance', start_date=None, end_date=None, first_level=None, last_level=None, tolerance=None, add_distance=False, mask=None)[source]

Extract one or more of the nearest gridpoint(s) from datarray based on lat lon coordinate(s).

Return a subsetted data array (or Dataset) for the grid point(s) falling nearest the input longitude and latitude coordinates. Optionally, subset the data array for years falling within provided date bounds. Time series can optionally be subsetted by dates. If 1D sequences of coordinates are given, the gridpoints will be concatenated along the new dimension “site”.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
lon (float, Sequence[float], xarray.DataArray, optional) – Longitude coordinate(s). Must be of the same length as lat.
lat (float, Sequence[float], xarray.DataArray, optional) – Latitude coordinate(s). Must be of the same length as lon.
method (str, optional) – Method to use for finding the nearest grid point. Options are “geographic” (default) and “distance”; “geographic” uses longitude and latitude coordinates directly while “distance” calculates distance on the Earth’s surface.
start_date (str, optional) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (str, optional) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (int or float, optional) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (int or float, optional) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.
tolerance (int or float, optional) – Masks values if the distance to the nearest gridpoint is larger than tolerance in meters.
add_distance (bool) – Whether to add a new coordinate “distance” to the output DataArray or Dataset.
mask (bool) – 2d boolean array with the same spatial dimensions as da, where True values indicate valid grid points to be considered for subsetting.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – Subsetted xarray.DataArray or xarray.Dataset.

Examples

import xarray as xr
from clisops.core.subset import subset_gridpoint

ds = xr.open_dataset(path_to_pr_file)

# Subset lat lon point
prSub = subset_gridpoint(ds.pr, lon=-75, lat=45)

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_gridpoint(ds, lon=-75, lat=45)

clisops.core.subset.subset_level(da, first_level=None, last_level=None)[source]

Subset input DataArray or Dataset based on first and last levels.

Return a subset of a DataArray or Dataset for levels falling within the provided bounds.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
first_level (int or float or str, optional) – First level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to first level of input data-array.
last_level (int or float or str, optional) – Last level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to last level of input data-array.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – Subsetted xarray.DataArray or xarray.Dataset.

Examples

import xarray as xr
from clisops.core.subset import subset_level

ds = xr.open_dataset(path_to_pr_file)

# Subset complete levels
prSub = subset_level(ds.pr, first_level=0, last_level=30)

# Subset single level
prSub = subset_level(ds.pr, first_level=1000, last_level=1000)

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_time(ds, first_level=1000.0, last_level=850.0)

clisops.core.subset.subset_level_by_values(da, level_values=None)[source]

Subset input DataArray or Dataset based on a sequence of vertical level values.

Return a subset of a DataArray or Dataset for levels matching those requested.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
level_values (Sequence[float] or Sequence[int], optional) – A list of level values to select.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – Subsetted xarray.DataArray or xarray.Dataset.

Notes

If any levels are not found, a ValueError will be raised. The requested levels will automatically be re-ordered to match the order in the input dataset.

Examples

import xarray as xr
from clisops.core.subset import subset_level_by_values

ds = xr.open_dataset(path_to_pr_file)

# Subset a selection of levels
levels = [1000.0, 850.0, 250.0, 100.0]
prSub = subset_level_by_values(ds.pr, level_values=levels)

clisops.core.subset.subset_shape(ds, shape, shape_crs=None, buffer=None, start_date=None, end_date=None, first_level=None, last_level=None)[source]

Subset a DataArray or Dataset spatially (and temporally) using a vector shape and date selection.

Return a subset of a DataArray or Dataset for grid points falling within the area of a Polygon and/or MultiPolygon shape, or grid points along the path of a LineString and/or MultiLineString. If the shape consists of several disjoint polygons, the output is cut to the smallest bbox including all polygons.

Parameters:

ds (xarray.DataArray or xarray.Dataset) – Input values.
shape (str or path or gpd.GeoDataFrame) – Path to a shape file, or GeoDataFrame directly. Supports GeoPandas-compatible formats.
shape_crs (str or int, optional) – EPSG number or PROJ4 string.
buffer (int or float, optional) – Buffer the shape in order to select a larger region stemming from it. Units are based on the shape degrees/metres.
start_date (str, optional) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (str, optional) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (int or float, optional) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (int or float, optional) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – A subset of ds.

Notes

If no CRS is found in the shape provided (e.g. RFC-7946 GeoJSON, https://en.wikipedia.org/wiki/GeoJSON), assumes a decimal degree datum (CRS84). Be advised that EPSG:4326 and OGC:CRS84 are not identical as axis order of lat and long differ between the two (for more information, see: https://github.com/OSGeo/gdal/issues/2035).

Examples

import xarray as xr
from clisops.core.subset import subset_shape

pr = xr.open_dataset(path_to_pr_file).pr

# Subset data array by shape
prSub = subset_shape(pr, shape=path_to_shape_file)

# Subset data array by shape and single year
prSub = subset_shape(pr, shape=path_to_shape_file, start_date="1990-01-01", end_date="1990-12-31")

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file])
dsSub = subset_shape(ds, shape=path_to_shape_file)

clisops.core.subset.subset_time(da, start_date=None, end_date=None)[source]

Subset input DataArray or Dataset based on start and end years.

Return a subset of a DataArray or Dataset for dates falling within the provided bounds.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
start_date (str, optional) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (str, optional) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – Subsetted xarray.DataArray or xarray.Dataset.

Notes

TODO: Add notes about different calendar types. Avoid “%Y-%m-31”. If you want complete month use only “%Y-%m”.

Examples

import xarray as xr
from clisops.core.subset import subset_time

ds = xr.open_dataset(path_to_pr_file)

# Subset complete years
prSub = subset_time(ds.pr, start_date="1990", end_date="1999")

# Subset single complete year
prSub = subset_time(ds.pr, start_date="1990", end_date="1990")

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_time(ds, start_date="1990", end_date="1999")

# Subset with year-month precision - Example subset 1990-03-01 to 1999-08-31 inclusively
txSub = subset_time(ds.tasmax, start_date="1990-03", end_date="1999-08")

# Subset with specific start_dates and end_dates
tnSub = subset_time(ds.tasmin, start_date="1990-03-13", end_date="1990-08-17")

clisops.core.subset.subset_time_by_components(da, *, time_components=None)[source]

Subset by one or more time components (year, month, day etc).

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
time_components (dict, optional) – Components of time to subset by.

Return type:

DataArray

Returns:

xarray.DataArray – Subsetted xarray.DataArray or xarray.Dataset.

Examples

import xarray as xr
from clisops.core.subset import subset_time_by_components

# To select all Winter months (Dec, Jan, Feb) or [12, 1, 2]:
da = xr.open_dataset(path_to_file).pr
winter_dict = {"month": [12, 1, 2]}
res = subset_time_by_components(da, time_components=winter_dict)

clisops.core.subset.subset_time_by_values(da, time_values=None)[source]

Subset input DataArray or Dataset based on a sequence of datetime strings.

Return a subset of a DataArray or Dataset for datetime objects matching those requested.

Parameters:

da (xarray.DataArray or xarray.Dataset) – Input data.
time_values (sequence of str, optional) – Values for time. Default: None.

Return type:

DataArray | Dataset

Returns:

xarray.DataArray or xarray.Dataset – Subsetted xarray.DataArray or xarray.Dataset.

Notes

If any datetimes are not found, a ValueError will be raised. The requested datetimes will automatically be reordered to match the order found in the input dataset.

Examples

import xarray as xr
from clisops.core.subset import subset_time_by_values

ds = xr.open_dataset(path_to_pr_file)

# Subset a selection of datetimes
times = ["2015-01-01", "2018-12-05", "2021-06-06"]
prSub = subset_time_by_values(ds.pr, time_values=times)