API

Core subset functionality

Subset module.

clisops.core.subset.create_mask(*, x_dim: DataArray, y_dim: DataArray, poly: GeoDataFrame, wrap_lons: bool = False, check_overlap: bool = False) DataArray[source]

Create a mask with values corresponding to the features in a GeoDataFrame using vectorize methods.

The returned mask’s points have the value of the first geometry of poly they fall in.

Parameters:
  • x_dim (xarray.DataArray) – X or longitudinal dimension of xarray object. Can also be given through ds_in.

  • y_dim (xarray.DataArray) – Y or latitudinal dimension of xarray object. Can also be given through ds_in.

  • poly (gpd.GeoDataFrame) – A GeoDataFrame used to create the xarray.DataArray mask. If its index doesn’t have an integer dtype, it will be reset to integers, which will be used in the mask.

  • wrap_lons (bool) – Shift vector longitudes by -180,180 degrees to 0,360 degrees; Default = False

  • check_overlap (bool) – Perform a check to verify if shapes contain overlapping geometries.

Returns:

xarray.DataArray

Examples

import geopandas as gpd
from clisops.core.subset import create_mask

ds = xr.open_dataset(path_to_tasmin_file)
polys = gpd.read_file(path_to_multi_shape_file)

# Get a mask from all polygons in the shape file
mask = create_mask(x_dim=ds.lon, y_dim=ds.lat, poly=polys)
ds = ds.assign_coords(regions=mask)

# Operations can be applied to each region with `groupby`. Ex:
ds = ds.groupby("regions").mean()

# Extra step to retrieve the names of those polygons stored in another column (here "id")
region_names = xr.DataArray(polys.id, dims=("regions",))
ds = ds.assign_coords(regions_names=region_names)
clisops.core.subset.create_weight_masks(ds_in: DataArray | Dataset, poly: GeoDataFrame) DataArray[source]

Create weight masks corresponding to the features in a GeoDataFrame using xESMF.

The returned masks values are the fraction of the corresponding polygon’s area that is covered by the grid cell. Summing along the spatial dimension will give 1 for each geometry. Requires xESMF.

Parameters:
  • ds_in (Union[xarray.DataArray, xarray.Dataset]) – xarray object containing the grid information, as understood by xESMF. For 2D lat/lon coordinates, the bounds arrays are required,

  • poly (gpd.GeoDataFrame) – GeoDataFrame used to create the xarray.DataArray mask. One mask will be created for each row in the dataframe. Will be converted to EPSG:4326 if needed.

Returns:

xarray.DataArray – Has a new geom dimension corresponding to the index of the input GeoDataframe. Non-geometry columns of poly are copied as auxiliary coordinates.

Examples

>>> import geopandas as gpd  
>>> import xarray as xr  
>>> from clisops.core.subset import create_weight_masks  
>>> ds = xr.open_dataset(path_to_tasmin_file)  
>>> polys = gpd.read_file(path_to_multi_shape_file)  
# Get a weight mask for each polygon in the shape file
>>> mask = create_weight_masks(
...     x_dim=ds.lon, y_dim=ds.lat, poly=polys
... )  
clisops.core.subset.distance(da: DataArray | Dataset, *, lon: float | Sequence[float] | DataArray, lat: float | Sequence[float] | DataArray) DataArray | Dataset[source]

Return distance to a point in meters.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • lon (Union[float, Sequence[float], xarray.DataArray]) – Longitude coordinate.

  • lat (Union[float, Sequence[float], xarray.DataArray]) – Latitude coordinate.

Returns:

xarray.DataArray – Distance in meters to point.

Examples

import xarray as xr
from clisops.core.subset import distance

# To get the indices from the closest point, use:
da = xr.open_dataset(path_to_pr_file).pr
d = distance(da, lon=-75, lat=45)
k = d.argmin()
i, j, _ = np.unravel_index(k, d.shape)
clisops.core.subset.shape_bbox_indexer(ds: Dataset, poly: GeoDataFrame | GeoSeries | GeometryArray)[source]

Return a spatial indexer that selects the indices of the grid cells covering the given geometries.

Parameters:
  • ds (xr.Dataset) – Input dataset.

  • poly (gpd.GeoDataFrame, gpd.GeoSeries, pd.array.GeometryArray, or list of shapely geometries.) – Shapes to cover. Can be of type Polygon, MultiPolygon, Point, or MultiPoint.

Returns:

dict – xarray indexer along native dataset coordinates, to be used as an argument to isel.

Examples

>>> indexer = shape_bbox_indexer(ds, poly)
>>> ds.isel(indexer)

Notes

This is used in particular to restrict the domain of a dataset before computing the weights for a spatial average.

clisops.core.subset.subset_bbox(da: DataArray | Dataset, lon_bnds: array | Tuple[float | None, float | None] = None, lat_bnds: array | Tuple[float | None, float | None] = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None, time_values: Sequence[str] | None = None, level_values: Sequence[float] | Sequence[int] | None = None) DataArray | Dataset[source]

Subset a DataArray or Dataset spatially (and temporally) using a lat lon bounding box and date selection.

Return a subset of a DataArray or Dataset for grid points falling within a spatial bounding box defined by longitude and latitudinal bounds and for dates falling within provided bounds.

TODO: returns the what? In the case of a lat-lon rectilinear grid, this simply returns the

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • lon_bnds (Union[np.array, Tuple[Optional[float], Optional[float]]]) – List of minimum and maximum longitudinal bounds. Optional. Defaults to all longitudes in original data-array.

  • lat_bnds (Union[np.array, Tuple[Optional[float], Optional[float]]]) – List of minimum and maximum latitudinal bounds. Optional. Defaults to all latitudes in original data-array.

  • start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.

  • end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.

  • first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.

  • last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.

  • time_values (Optional[Sequence[str]]) – A list of datetime strings to subset.

  • level_values (Optional[Union[Sequence[float], Sequence[int]]]) – A list of level values to select.

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Notes

subset_bbox expects the lower and upper bounds to be provided in ascending order. If the actual coordinate values are descending then this will be detected and your selection reversed before the data subset is returned.

Examples

import xarray as xr
from clisops.core.subset import subset_bbox

ds = xr.open_dataset(path_to_pr_file)

# Subset lat lon
prSub = subset_bbox(ds.pr, lon_bnds=[-75, -70], lat_bnds=[40, 45])
clisops.core.subset.subset_gridpoint(da: DataArray | Dataset, lon: float | Sequence[float] | DataArray | None = None, lat: float | Sequence[float] | DataArray | None = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None, tolerance: float | None = None, add_distance: bool = False) DataArray | Dataset[source]

Extract one or more of the nearest gridpoint(s) from datarray based on lat lon coordinate(s).

Return a subsetted data array (or Dataset) for the grid point(s) falling nearest the input longitude and latitude coordinates. Optionally subset the data array for years falling within provided date bounds. Time series can optionally be subsetted by dates. If 1D sequences of coordinates are given, the gridpoints will be concatenated along the new dimension “site”.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • lon (Optional[Union[float, Sequence[float], xarray.DataArray]]) – Longitude coordinate(s). Must be of the same length as lat.

  • lat (Optional[Union[float, Sequence[float], xarray.DataArray]]) – Latitude coordinate(s). Must be of the same length as lon.

  • start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.

  • end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.

  • first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.

  • last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.

  • tolerance (Optional[float]) – Masks values if the distance to the nearest gridpoint is larger than tolerance in meters.

  • add_distance (bool)

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Examples

import xarray as xr
from clisops.core.subset import subset_gridpoint

ds = xr.open_dataset(path_to_pr_file)

# Subset lat lon point
prSub = subset_gridpoint(ds.pr, lon=-75, lat=45)

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_gridpoint(ds, lon=-75, lat=45)
clisops.core.subset.subset_level(da: DataArray | Dataset, first_level: int | float | str | None = None, last_level: int | float | str | None = None) DataArray | Dataset[source]

Subset input DataArray or Dataset based on first and last levels.

Return a subset of a DataArray or Dataset for levels falling within the provided bounds.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • first_level (Optional[Union[int, float, str]]) – First level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to first level of input data-array.

  • last_level (Optional[Union[int, float, str]]) – Last level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to last level of input data-array.

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Examples

import xarray as xr
from clisops.core.subset import subset_level

ds = xr.open_dataset(path_to_pr_file)

# Subset complete levels
prSub = subset_level(ds.pr, first_level=0, last_level=30)

# Subset single level
prSub = subset_level(ds.pr, first_level=1000, last_level=1000)

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_time(ds, first_level=1000.0, last_level=850.0)

Notes

TBA

clisops.core.subset.subset_level_by_values(da: DataArray | Dataset, level_values: Sequence[float] | Sequence[int] | None = None) DataArray | Dataset[source]

Subset input DataArray or Dataset based on a sequence of vertical level values.

Return a subset of a DataArray or Dataset for levels matching those requested.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • level_values (Optional[Union[Sequence[float], Sequence[int]]]) – A list of level values to select.

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Examples

import xarray as xr
from clisops.core.subset import subset_level_by_values

ds = xr.open_dataset(path_to_pr_file)

# Subset a selection of levels
levels = [1000.0, 850.0, 250.0, 100.0]
prSub = subset_level_by_values(ds.pr, level_values=levels)

Notes

If any levels are not found, a ValueError will be raised. The requested levels will automatically be re-ordered to match the order in the input dataset.

clisops.core.subset.subset_shape(ds: DataArray | Dataset, shape: str | Path | GeoDataFrame, raster_crs: int | str | None = None, shape_crs: int | str | None = None, buffer: int | float | None = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None) DataArray | Dataset[source]

Subset a DataArray or Dataset spatially (and temporally) using a vector shape and date selection.

Return a subset of a DataArray or Dataset for grid points falling within the area of a Polygon and/or MultiPolygon shape, or grid points along the path of a LineString and/or MultiLineString. If the shape consists of several disjoint polygons, the output is cut to the smallest bbox including all polygons.

Parameters:
  • ds (Union[xarray.DataArray, xarray.Dataset]) – Input values.

  • shape (Union[str, Path, gpd.GeoDataFrame]) – Path to a shape file, or GeoDataFrame directly. Supports GeoPandas-compatible formats.

  • raster_crs (Optional[Union[str, int]]) – EPSG number or PROJ4 string.

  • shape_crs (Optional[Union[str, int]]) – EPSG number or PROJ4 string.

  • buffer (Optional[Union[int, float]]) – Buffer the shape in order to select a larger region stemming from it. Units are based on the shape degrees/metres.

  • start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.

  • end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.

  • first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.

  • last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.

Returns:

Union[xarray.DataArray, xarray.Dataset] – A subset of ds

Notes

If no CRS is found in the shape provided (e.g. RFC-7946 GeoJSON, https://en.wikipedia.org/wiki/GeoJSON), assumes a decimal degree datum (CRS84). Be advised that EPSG:4326 and OGC:CRS84 are not identical as axis order of lat and long differs between the two (for more information, see: https://github.com/OSGeo/gdal/issues/2035).

Examples

import xarray as xr
from clisops.core.subset import subset_shape

pr = xr.open_dataset(path_to_pr_file).pr

# Subset data array by shape
prSub = subset_shape(pr, shape=path_to_shape_file)

# Subset data array by shape and single year
prSub = subset_shape(
    pr, shape=path_to_shape_file, start_date="1990-01-01", end_date="1990-12-31"
)

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file])
dsSub = subset_shape(ds, shape=path_to_shape_file)
clisops.core.subset.subset_time(da: DataArray | Dataset, start_date: str | None = None, end_date: str | None = None) DataArray | Dataset[source]

Subset input DataArray or Dataset based on start and end years.

Return a subset of a DataArray or Dataset for dates falling within the provided bounds.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.

  • end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Examples

import xarray as xr
from clisops.core.subset import subset_time

ds = xr.open_dataset(path_to_pr_file)

# Subset complete years
prSub = subset_time(ds.pr, start_date="1990", end_date="1999")

# Subset single complete year
prSub = subset_time(ds.pr, start_date="1990", end_date="1990")

# Subset multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file])
dsSub = subset_time(ds, start_date="1990", end_date="1999")

# Subset with year-month precision - Example subset 1990-03-01 to 1999-08-31 inclusively
txSub = subset_time(ds.tasmax, start_date="1990-03", end_date="1999-08")

# Subset with specific start_dates and end_dates
tnSub = subset_time(ds.tasmin, start_date="1990-03-13", end_date="1990-08-17")

Notes

TODO add notes about different calendar types. Avoid “%Y-%m-31”. If you want complete month use only “%Y-%m”.

clisops.core.subset.subset_time_by_components(da: DataArray | Dataset, *, time_components: Dict | None = None) DataArray[source]

Subsets by one or more time components (year, month, day etc).

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • time_components (Union[Dict, None] = None)

Returns:

xarray.DataArray

Examples

import xarray as xr
from clisops.core.subset import subset_time_by_components

# To select all Winter months (Dec, Jan, Feb) or [12, 1, 2]:
da = xr.open_dataset(path_to_file).pr
winter_dict = {"month": [12, 1, 2]}
res = subset_time_by_components(da, time_components=winter_dict)
clisops.core.subset.subset_time_by_values(da: DataArray | Dataset, time_values: Sequence[str] | None = None) DataArray | Dataset[source]

Subset input DataArray or Dataset based on a sequence of datetime strings.

Return a subset of a DataArray or Dataset for datetimes matching those requested.

Parameters:
  • da (Union[xarray.DataArray, xarray.Dataset]) – Input data.

  • time_values (Optional[Sequence[str]]) – Values for time. Default: None

Returns:

Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset

Examples

import xarray as xr
from clisops.core.subset import subset_time_by_values

ds = xr.open_dataset(path_to_pr_file)

# Subset a selection of datetimes
times = ["2015-01-01", "2018-12-05", "2021-06-06"]
prSub = subset_time_by_values(ds.pr, time_values=times)

Notes

If any datetimes are not found, a ValueError will be raised. The requested datetimes will automatically be re-ordered to match the order in the input dataset.

Core average functionality

Average module.

clisops.core.average.average_over_dims(ds: DataArray | Dataset, dims: Sequence[str] = None, ignore_undetected_dims: bool = False) DataArray | Dataset[source]

Average a DataArray or Dataset over the dimensions specified.

Parameters:
  • ds (Union[xr.DataArray, xr.Dataset]) – Input values.

  • dims (Sequence[{“time”, “level”, “latitude”, “longitude”}]) – The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions must be one of [“time”, “level”, “latitude”, “longitude”].

  • ignore_undetected_dims (bool) – If the dimensions specified are not found in the dataset, an Exception will be raised if set to True. If False, an exception will not be raised and the other dimensions will be averaged over. Default = False

Returns:

Union[xr.DataArray, xr.Dataset] – New Dataset or DataArray object averaged over the indicated dimensions. The indicated dimensions will have been removed.

Examples

from clisops.core.average import average_over_dims

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over latitude and longitude
prAvg = average_over_dims(
    pr, dims=["latitude", "longitude"], ignore_undetected_dims=True
)
clisops.core.average.average_shape(ds: Dataset, shape: str | Path | GeoDataFrame, variable: str | Sequence[str] = None) DataArray | Dataset[source]

Average a DataArray or Dataset spatially using vector shapes.

Return a DataArray or Dataset averaged over each Polygon given. Requires xESMF.

Parameters:
  • ds (xarray.Dataset) – Input values, coordinate attributes must be CF-compliant.

  • shape (Union[str, Path, gpd.GeoDataFrame]) – Path to shape file, or directly a GeoDataFrame. Supports formats compatible with geopandas. Will be converted to EPSG:4326 if needed.

  • variable (Union[str, Sequence[str], None]) – Variables to average. If None, average over all data variables.

Returns:

Union[xarray.DataArray, xarray.Dataset]ds spatially-averaged over the polygon(s) in shape. Has a new geom dimension corresponding to the index of the input GeoDataFrame. Non-geometry columns of the GeoDataFrame are copied as auxiliary coordinates.

Notes

The spatial weights are computed with ESMF, which uses corners given in lat/lon format (EPSG:4326), the input dataset ds must provide those. In opposition to subset.subset_shape, the weights computed here take partial overlaps and holes into account.

As xESMF computes the weight masks only once, skipping missing values is not really feasible. Thus, all NaNs propagate when performing the average.

Examples

import xarray as xr  # doctest: +SKIP
from clisops.core.average import average_shape

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over shape
prAvg = average_shape(pr, shape=path_to_shape_file)

# Average multiple variables in a single dataset
ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file])
dsAvg = average_shape(ds, shape=path_to_shape_file)
clisops.core.average.average_time(ds: DataArray | Dataset, freq: str) DataArray | Dataset[source]

Average a DataArray or Dataset over the time frequency specified.

Parameters:
  • ds (Union[xr.DataArray, xr.Dataset]) – Input values.

  • freq (str) – The frequency to average over. One of “month”, “year”.

Returns:

Union[xr.DataArray, xr.Dataset] – New Dataset or DataArray object averaged over the indicated time frequency.

Examples

from clisops.core.average import average_time

pr = xr.open_dataset(path_to_pr_file).pr

# Average data array over each month
prAvg = average_time(pr, freq="month")

Core regrid functionality

Regrid module.

class clisops.core.regrid.Grid(ds: Dataset | DataArray | None = None, grid_id: str | None = None, grid_instructor: tuple | float | int | None = None, compute_bounds: bool | None = False)[source]

Bases: object

Create a Grid object that is suitable to serve as source or target grid of the Weights class.

Pre-processes coordinate variables of input dataset (eg. create or read dataset from input, reformat, generate bounds, identify duplicated and collapsing cells, determine zonal / east-west extent).

Parameters:
  • ds (xr.Dataset or xr.DataArray, optional) – Uses horizontal coordinates of an xarray.Dataset or xarray.DataArray to create a Grid object. The default is None.

  • grid_id (str, optional) – Create the Grid object from a selection of pre-defined grids, e.g. “1deg” or “2pt5deg”. The grids are provided via the roocs_grids package (https://github.com/roocs/roocs-grids). A special setting is “adaptive”/”auto”, which requires the parameter ‘ds’ to be specified as well, and creates a regular lat-lon grid of the same extent and approximate resolution as the grid described by ‘ds’. The default is None.

  • grid_instructor (tuple, float or int, optional) – Create a regional or global regular lat-lon grid using xESMF utility functions. - Global grid: grid_instructor = (lon_step, lat_step) or grid_instructor = step - Regional grid: grid_instructor = (lon_start, lon_end, lon_step, lat_start, lat_end, lat_step) or grid_instructor = (start, end, step). The default is None.

  • compute_bounds (bool, optional) – Compute latitude and longitude bounds if the dataset has none defined. The default is False.

compare_grid(ds_or_Grid: Dataset | Grid, verbose: bool = False) bool[source]

Compare two Grid objects.

Will compare the checksum of two Grid objects, which depend on the lat and lon coordinate variables, their bounds and if defined, a mask.

Parameters:
  • ds_or_Grid (xarray.Dataset or Grid) – Grid that the current Grid object shall be compared to.

  • verbose (bool) – Whether to also print the result. The default is False.

Returns:

bool – Returns True if the two Grids are considered identical within the defined precision, else returns False.

detect_bounds(coordinate: str) str | None[source]

Use cf_xarray to obtain the variable name of the requested coordinates bounds.

Parameters:

coordinate (str) – Name of the coordinate variable to determine the bounds from.

Returns:

str, optional – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.

detect_coordinate(coord_type: str) str[source]

Use cf_xarray to obtain the variable name of the requested coordinate.

Parameters:

coord_type (str) – Coordinate type understood by cf-xarray, eg. ‘lat’, ‘lon’, …

Raises:

AttributeError – Raised if the requested coordinate cannot be identified.

Returns:

str – Coordinate variable name.

detect_extent() str[source]

Determine the grid extent in zonal / east-west direction (‘regional’ or ‘global’).

Returns:

str – ‘regional’ or ‘global’.

detect_format() str[source]

Detect format of a dataset. Yet supported are ‘CF’, ‘SCRIP’, ‘xESMF’.

Returns:

str – The format, if supported. Else raises an Exception.

detect_shape() tuple[int, int, int][source]

Detect the shape of the grid.

Returns a tuple of (nlat, nlon, ncells). For an unstructured grid nlat and nlon are not defined and therefore the returned tuple will be (ncells, ncells, ncells).

Returns:

  • int – Number of latitude points in the grid.

  • int – Number of longitude points in the grid.

  • int – Number of cells in the grid.

detect_type() str[source]

Detect type of the grid as one of “regular_lat_lon”, “curvilinear”, or “unstructured”.

Otherwise, will issue an Exception if grid type is not supported.

Returns:

str – The detected grid type.

grid_reformat(grid_format: str, keep_attrs: bool = False)[source]

Reformat the Dataset attached to the Grid object to a target format.

Parameters:
  • grid_format (str) – Target format of the reformat operation. Yet supported are ‘SCRIP’, ‘CF’, ‘xESMF’.

  • keep_attrs (bool) – Whether to keep the global attributes.

Returns:

ds_ref (xarray.Dataset) – Reformatted dataset.

to_netcdf(folder: str | Path | None = './', filename: str | None = '', grid_format: str | None = 'CF', keep_attrs: bool | None = True)[source]

Store a copy of the horizontal Grid as netCDF file on disk.

Define output folder, filename and output format (currently only ‘CF’ is supported). Does not overwrite an existing file.

Parameters:
  • folder (str or Path, optional) – Output folder. The default is the current working directory “./”.

  • filename (str, optional) – Output filename, to be defined separately from folder. The default is ‘grid_<grid.id>.nc’.

  • grid_format (str, optional) – The format the grid information shall be stored as (in terms of variable attributes and dimensions). The default is “CF”, which is also the only supported output format currently supported.

  • keep_attrs (bool, optional) – Whether to store the global attributes in the output netCDF file. The default is True.

class clisops.core.regrid.Weights(grid_in: Grid, grid_out: Grid, method: str, from_disk: str | Path | None = None, format: str | None = None)[source]

Bases: object

Creates remapping weights out of two Grid objects serving as source and target grid.

Reads weights from cache if possible. Reads weights from disk if specified (not yet implemented). In the latter case, the weight file format has to be supported, to reformat it to xESMF format.

Parameters:
  • grid_in (Grid) – Grid object serving as source grid.

  • grid_out (Grid) – Grid object serving as target grid.

  • method (str) – Remapping method the weights should be / have been calculated with. One of [“nearest_s2d”, “bilinear”, “conservative”, “patch”] if weights have to be calculated. Free text if weights are read from disk.

  • from_disk (str, optional) – Not yet implemented. Instead of calculating the regridding weights (or reading them from the cache), read them from disk. The default is None.

  • format (str, optional) – Not yet implemented. When reading weights from disk, the input format may be specified. If omitted, there will be an attempt to detect the format. The default is None.

reformat(format_from: str, format_to: str) None[source]

Reformat remapping weights.

Warning

This method is not yet implemented.

save_to_disk(filename=None, wformat: str = 'xESMF') None[source]

Write weights to disk in a certain format.

Warning

This method is not yet implemented.

clisops.core.regrid.regrid(grid_in: Grid, grid_out: Grid, weights: Weights, adaptive_masking_threshold: float | None = 0.5, keep_attrs: bool | str = True) Dataset[source]

Perform regridding operation including dealing with dataset and variable attributes.

Parameters:
  • grid_in (Grid) – Grid object of the source grid, e.g. created out of source xarray.Dataset.

  • grid_out (Grid) – Grid object of the target grid.

  • weights (Weights) – Weights object, as created by using grid_in and grid_out Grid objects as input.

  • adaptive_masking_threshold (float, optional) – (AMT) A value within the [0., 1.] interval that defines the maximum RATIO of missing_values amongst the total number of data values contributing to the calculation of the target grid cell value. For a fraction [0., AMT[ of the contributing source data missing, the target grid cell will be set to missing_value, else, it will be re-normalized by the factor 1./(1.-RATIO). Thus, if AMT is set to 1, all source grid cells that contribute to a target grid cell must be missing in order for the target grid cell to be defined as missing itself. Values greater than 1 or less than 0 will cause adaptive masking to be turned off. This adaptive masking technique allows to reuse generated weights for differently masked data (e.g. land-sea masks or orographic masks that vary with depth / height). The default is 0.5.

  • keep_attrs (bool or str) – Sets the global attributes of the resulting dataset, apart from the ones set by this routine: True: attributes of grid_in.ds will be in the resulting dataset. False: no attributes but the ones newly set by this routine “target”: attributes of grid_out.ds will be in the resulting dataset. The default is True.

Returns:

xarray.Dataset – The regridded data in form of an xarray.Dataset.

clisops.core.regrid.weights_cache_flush(weights_dir_init: str | Path | None = '', dryrun: bool | None = False, verbose: bool | None = False) None[source]

Flush and reinitialize the local weights cache.

Parameters:
  • weights_dir_init (str, optional) – Directory name to reinitialize the local weights cache in. Will be created if it does not exist. The default is CONFIG[“clisops:grid_weights”][“local_weights_dir”] as defined in roocs.ini (or as redefined by a manual weights_cache_init call).

  • dryrun (bool, optional) – If True, it will only print all files that would get deleted. The default is False.

  • verbose (bool, optional) – If True, and dryrun is False, will print all files that are getting deleted. The default is False.

Returns:

None

clisops.core.regrid.weights_cache_init(weights_dir: str | Path | None = None, config: dict = {'clisops:coordinate_precision': {'hor_coord_decimals': '6', 'vert_coord_decimals': '6'}, 'clisops:grid_weights': {'local_weights_dir': '/home/docs/.local/share/clisops/grid_weights', 'remote_weights_svc': ''}, 'clisops:read': {'chunk_memory_limit': '250MiB'}, 'clisops:write': {'file_size_limit': '1GB', 'output_staging_dir': ''}, 'config_data_types': {'boolean': 'use_catalog is_default_for_path', 'dicts': 'mappings attr_defaults fixed_path_mappings fixed_path_modifiers', 'extra_booleans': '', 'extra_dicts': '', 'extra_floats': '', 'extra_ints': '', 'extra_lists': '', 'floats': '', 'ints': '', 'lists': 'facet_rule'}, 'elasticsearch': {'analysis_store': 'roocs-analysis', 'character_store': 'roocs-char', 'endpoint': 'elasticsearch.ceda.ac.uk', 'fix_proposal_store': 'roocs-fix-prop', 'fix_store': 'roocs-fix', 'port': '443'}, 'environment': {'mkl_num_threads': '1', 'numexpr_num_threads': '1', 'omp_num_threads': '1', 'openblas_num_threads': '1', 'veclib_maximum_threads': '1'}, 'project:c3s-cica-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/gws/nopw/j04/cp4cds1_vol1/data/c3s-cmip5', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cmip5/', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'project_name': 'c3s-cmip5', 'use_catalog': False}, 'project:c3s-cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/badc/cmip6/data/CMIP6', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'project_name': 'c3s-cmip6', 'use_catalog': True}, 'project:c3s-cmip6-decadal': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/badc/cmip6/data/CMIP6', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'project_name': 'c3s-cmip6-decadal', 'use_catalog': True}, 'project:c3s-cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/gws/nopw/j04/cp4cds1_vol1/data/c3s-cordex', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cordex/', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'project_name': 'c3s-cordex', 'use_catalog': True}, 'project:c3s-ipcc-ar6-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-ipcc-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/badc/cmip5/data/cmip5', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'version', 'variable'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'project_name': 'cmip5', 'use_catalog': False}, 'project:cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/badc/cmip6/data/CMIP6', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': False, 'mappings': {'project': 'mip_era', 'variable': 'variable_id'}, 'project_name': 'cmip6', 'use_catalog': False}, 'project:copernicus -interactive-climate-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/badc/cordex/data/cordex', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'project_name': 'cordex', 'use_catalog': False}, 'project:ipcc-ar6-interactive-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}}) None[source]

Initialize global variable weights_dir as used by the Weights class.

Parameters:
  • weights_dir (str or Path) – Directory name to initialize the local weights cache in. Will be created if it does not exist. Per default, this function is called upon import with weights_dir as defined in roocs.ini.

  • config (dict) – Configuration dictionary as read from roocs.ini.

Returns:

None

Subset operation

class clisops.ops.subset.Subset(ds, file_namer: str = 'standard', split_method: str = 'time:auto', output_dir: str | Path | None = None, output_type: str = 'netcdf', **params)[source]

Bases: Operation

clisops.ops.subset.subset(ds: Dataset | str | Path, *, time: str | Tuple[str, str] | TimeParameter | Series | Interval | None = None, area: str | Tuple[int | float | str, int | float | str, int | float | str, int | float | str] | AreaParameter | None = None, level: str | Tuple[int | float | str, int | float | str] | LevelParameter | Interval | None = None, time_components: str | Dict | TimeComponents | TimeComponentsParameter | None = None, output_dir: str | Path | None = None, output_type='netcdf', split_method='time:auto', file_namer='standard') List[Dataset | str][source]

Subset operation.

Parameters:
  • ds (Union[xr.Dataset, str])

  • time (Optional[Union[str, Tuple[str, str], TimeParameter, Series, Interval]] = None,)

  • area (str or AreaParameter or Tuple[Union[int, float, str], Union[int, float, str], Union[int, float, str], Union[int, float, str]], optional)

  • level (Optional[Union[str, Tuple[Union[int, float, str], Union[int, float, str]], LevelParameter, Interval] = None,)

  • time_components (Optional[Union[str, Dict, TimeComponentsParameter]] = None,)

  • output_dir (Optional[Union[str, Path]] = None)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

Returns:

List[Union[xr.Dataset, str]] – A list of the subsetted outputs in the format selected; str corresponds to file paths if the output format selected is a file.

Examples

ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”
time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”) or “2085-01-01T12:00:00Z/2120-12-30T12:00:00Z”
area: (-5.,49.,10.,65) or “0.,49.,10.,65” or [0, 49.5, 10, 65] with the order being lon_0, lat_0, lon_1, lat_1
level: (1000.,) or “1000/2000” or (“1000.50”, “2000.60”)
time_components: “year:2000,2004,2008|month:01,02” or {“year”: (2000, 2004, 2008), “months”: (1, 2)}
output_dir: “/cache/wps/procs/req0111”
output_type: “netcdf”
split_method: “time:auto”
file_namer: “standard”

Note

If you request a selection range (such as level, latitude or longitude) that specifies the lower and upper bounds in the opposite direction to the actual coordinate values then clisops.ops.subset will detect this issue and reverse your selection before returning the data subset.

Average operations

clisops.ops.average.average_over_dims(ds: Dataset | str, dims: Sequence[str] | DimensionParameter | None = None, ignore_undetected_dims: bool = False, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') List[Dataset | str][source]

Calculate an average over given dimensions.

Parameters:
  • ds (Union[xr.Dataset, str]) – Xarray dataset.

  • dims (Optional[Union[Sequence[{“time”, “level”, “latitude”, “longitude”}], DimensionParameter]]) – The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions must be one of [“time”, “level”, “latitude”, “longitude”].

  • ignore_undetected_dims (bool) – If the dimensions specified are not found in the dataset, an Exception will be raised if set to True. If False, an exception will not be raised and the other dimensions will be averaged over. Default = False

  • output_dir (Optional[Union[str, Path]])

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

Returns:

List[Union[xr.Dataset, str]] – A list of the outputs in the format selected; str corresponds to file paths if the output format selected is a file.

Examples

ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”
dims: [‘latitude’, ‘longitude’]
ignore_undetected_dims: False
output_dir: “/cache/wps/procs/req0111”
output_type: “netcdf”
split_method: “time:auto”
file_namer: “standard”
clisops.ops.average.average_shape(ds: Dataset | Path | str, shape: str | Path | GeoDataFrame, variable: str | Sequence[str] | None = None, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') List[Dataset | str][source]

Calculate a spatial average over a given shape.

Parameters:
  • ds (Union[xr.Dataset, str]) – Xarray dataset.

  • shape (Union[str, Path, gpd.GeoDataFrame]) – Path to shape file, or directly a GeoDataFrame. Supports formats compatible with geopandas. Will be converted to EPSG:4326 if needed.

  • variable (Optional[Union[str, Sequence[str], None]]) – Variables to average. If None, average over all data variables.

  • output_dir (Optional[Union[str, Path]])

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

Returns:

List[Union[xr.Dataset, str]] – A list of the outputs in the format selected; str corresponds to file paths if the output format selected is a file.

Examples

ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”
dims: [‘latitude’, ‘longitude’]
ignore_undetected_dims: False
output_dir: “/cache/wps/procs/req0111”
output_type: “netcdf”
split_method: “time:auto”
file_namer: “standard”
clisops.ops.average.average_time(ds: Dataset | str, freq: str, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') List[Dataset | str][source]
Parameters:
  • ds (Union[xr.Dataset, str]) – Xarray dataset.

  • freq (str) – The frequency to average over. One of “month”, “year”.

  • output_dir (Optional[Union[str, Path]])

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

Returns:

  • List[Union[xr.Dataset, str]]

  • A list of the outputs in the format selected; str corresponds to file paths if the

  • output format selected is a file.

Examples

ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”
dims: [‘latitude’, ‘longitude’]
ignore_undetected_dims: False
output_dir: “/cache/wps/procs/req0111”
output_type: “netcdf”
split_method: “time:auto”
file_namer: “standard”

Regrid operation

clisops.ops.regrid.regrid(ds: Dataset | str | Path, *, method: str | None = 'nearest_s2d', adaptive_masking_threshold: int | float | None = 0.5, grid: Dataset | DataArray | int | float | tuple | str | None = 'adaptive', output_dir: str | Path | None = None, output_type: str | None = 'netcdf', split_method: str | None = 'time:auto', file_namer: str | None = 'standard', keep_attrs: bool | str | None = True) List[Dataset | str][source]

Regrid specified input file or xarray object.

Parameters:
  • ds (Union[xr.Dataset, str])

  • method ({“nearest_s2d”, “conservative”, “patch”, “bilinear”})

  • adaptive_masking_threshold (Optional[Union[int, float]])

  • grid (Union[xr.Dataset, xr.DataArray, int, float, tuple, str])

  • output_dir (Optional[Union[str, Path]] = None)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • keep_attrs ({True, False, “target”})

Returns:

List[Union[xr.Dataset, str]] – A list of the regridded outputs in the format selected; str corresponds to file paths if the output format selected is a file.

Examples

ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”
method: “nearest_s2d”
adaptive_masking_threshold:
grid: “1deg”
output_dir: “/cache/wps/procs/req0111”
output_type: “netcdf”
split_method: “time:auto”
file_namer: “standard”
keep_attrs: True

Common functions

clisops.utils.common.check_dir(func: function, dr: str | Path)[source]

Ensure that directory dr exists before function/method is called, decorator.

clisops.utils.common.enable_logging() List[int][source]
clisops.utils.common.expand_wildcards(paths: str | Path) list[source]

Expand the wildcards that may be present in Paths.

clisops.utils.common.require_module(func: function, module: module, module_name: str, min_version: str | None = '0.0.0', max_supported_version: str | None = None, max_supported_warning: str | None = None)[source]

Ensure that module is installed before function/method is called, decorator.

Output Utilities

class clisops.utils.output_utils.FileLock(fpath)[source]

Bases: object

Create and release a lockfile.

Adapted from https://github.com/cedadev/cmip6-object-store/cmip6_zarr/file_lock.py

acquire(timeout=10)[source]

Create actual lockfile, raise error if already exists beyond ‘timeout’.

release()[source]

Release lock, i.e. delete lockfile.

clisops.utils.output_utils.check_format(fmt)[source]

Checks requested format exists.

clisops.utils.output_utils.create_lock(fname: str | Path)[source]

Check whether lockfile already exists and else creates lockfile.

Parameters:

fname (str) – Path of the lockfile to be created.

Returns:

FileLock object or None.

clisops.utils.output_utils.filter_times_within(times, start=None, end=None)[source]

Takes an array of datetimes, returning a reduced array if start or end times are defined and are within the main array.

clisops.utils.output_utils.get_chunk_length(da)[source]

Calculate the chunk length to use when chunking xarray datasets.

Based on memory limit provided in config and the size of the dataset.

clisops.utils.output_utils.get_da(ds)[source]

Returns xr.DataArray when format of ds may be either xr.Dataset or xr.DataArray.

clisops.utils.output_utils.get_format_extension(fmt)[source]

Finds the extension for the requested output format.

clisops.utils.output_utils.get_format_writer(fmt)[source]

Finds the output method for the requested output format.

clisops.utils.output_utils.get_output(ds, output_type, output_dir, namer)[source]

Return output after applying chunking and determining the output format and chunking.

clisops.utils.output_utils.get_time_slices(ds: Dataset | DataArray, split_method, start=None, end=None, file_size_limit: str = None) List[Tuple[str, str]][source]

Take an xarray Dataset or DataArray, assume it can be split on the time axis into a sequence of slices. Optionally, take a start and end date to specify a sub-slice of the main time axis.

Use the prescribed file size limit to generate a list of (“YYYY-MM-DD”, “YYYY-MM-DD”) slices so that the output files do not (significantly) exceed the file size limit.

Parameters:
  • ds (Union[xr.Dataset, xr.DataArray])

  • split_method

  • start

  • end

  • file_size_limit (str) – a string specifying “<number><units>”.

Returns:

List[Tuple[str, str]]

File Namers

class clisops.utils.file_namers.SimpleFileNamer(replace=None, extra='')[source]

Bases: _BaseFileNamer

Simple file namer class.

Generates numbered file names.

class clisops.utils.file_namers.StandardFileNamer(replace=None, extra='')[source]

Bases: SimpleFileNamer

Standard file namer class.

Generates file names based on input dataset.

get_file_name(ds, fmt='nc')[source]

Constructs file name.

clisops.utils.file_namers.get_file_namer(name)[source]

Returns the correct filenamer from the provided name.

Dataset Utilities

clisops.utils.dataset_utils.add_hor_CF_coord_attrs(ds, lat='lat', lon='lon', lat_bnds='lat_bnds', lon_bnds='lon_bnds', keep_attrs=False)[source]

Add the common CF variable attributes to the horizontal coordinate variables.

Parameters:
  • lat (str, optional) – Latitude coordinate variable name. The default is “lat”.

  • lon (str, optional) – Longitude coordinate variable name. The default is “lon”.

  • lat_bnds (str, optional) – Latitude bounds coordinate variable name. The default is “lat_bnds”.

  • lon_bnds (str, optional) – Longitude bounds coordinate variable name. The default is “lon_bnds”.

  • keep_attrs (bool, optional) – Whether to keep original coordinate variable attributes if they do not conflict. In case of a conflict, the attribute value will be overwritten independent of this setting. The default is False.

Returns:

xarray.Dataset – The input dataset with updated coordinate variable attributes.

clisops.utils.dataset_utils.adjust_date_to_calendar(da, date, direction='backwards')[source]

Check that the date specified exists in the calendar type of the dataset.

If not present, changes the date a day at a time (up to a maximum of 5 times) to find a date that does exist. The direction to change the date by is indicated by ‘direction’.

Parameters:
  • da (xarray.Dataset or xarray.DataArray) – The data to examine.

  • date (str) – The date to check.

  • direction (str) – The direction to move in days to find a date that does exist. ‘backwards’ means the search will go backwards in time until an existing date is found. ‘forwards’ means the search will go forwards in time. The default is ‘backwards’.

Returns:

str – The next possible existing date in the calendar of the dataset.

clisops.utils.dataset_utils.calculate_offset(lon, first_element_value)[source]

Calculate the number of elements to roll the dataset by in order to have longitude from within requested bounds.

Parameters:
  • lon – Longitude coordinate of xarray dataset.

  • first_element_value – The value of the first element of the longitude array to roll to.

clisops.utils.dataset_utils.cf_convert_between_lon_frames(ds_in, lon_interval)[source]

Convert ds or lon_interval (whichever deems appropriate) to the other longitude frame, if the longitude frames do not match.

If ds and lon_interval are defined on different longitude frames ([-180, 180] and [0, 360]), this function will convert one of the input parameters to the other longitude frame, preferably the lon_interval. Adjusts shifted longitude frames [0-x, 360-x] in the dataset to one of the two standard longitude frames, dependent on the specified lon_interval. In case of curvilinear grids featuring an additional 1D x-coordinate of the projection, this projection x-coordinate will not get converted.

Parameters:
  • ds_in (xarray.Dataset or xarray.DataArray) – xarray data object with defined longitude dimension.

  • lon_interval (tuple or list) – length-2-tuple or -list of floats or integers denoting the bounds of the longitude interval.

Returns:

Tuple(ds, lon_low, lon_high) – The xarray.Dataset and the bounds of the longitude interval, potentially adjusted in terms of their defined longitude frame.

clisops.utils.dataset_utils.check_lon_alignment(ds, lon_bnds)[source]

Check whether the longitude subset requested is within the bounds of the dataset.

If not try to roll the dataset so that the request is. Raise an exception if rolling is not possible.

clisops.utils.dataset_utils.detect_bounds(ds, coordinate) str | None[source]

Use cf_xarray to obtain the variable name of the requested coordinates bounds.

Parameters:
  • ds (xarray.Dataset, xarray.DataArray) – Dataset the coordinate bounds variable name shall be obtained from.

  • coordinate (str) – Name of the coordinate variable to determine the bounds from.

Returns:

str or None – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.

clisops.utils.dataset_utils.detect_coordinate(ds, coord_type)[source]

Use cf_xarray to obtain the variable name of the requested coordinate.

Parameters:
  • ds (xarray.Dataset, xarray.DataArray) – Dataset the coordinate variable name shall be obtained from.

  • coord_type (str) – Coordinate type understood by cf-xarray, eg. ‘lat’, ‘lon’, …

Raises:

AttributeError – Raised if the requested coordinate cannot be identified.

Returns:

str – Coordinate variable name.

clisops.utils.dataset_utils.detect_format(ds)[source]

Detect format of a dataset. Yet supported are ‘CF’, ‘SCRIP’, ‘xESMF’.

Parameters:

ds (xr.Dataset) – xarray.Dataset of which to detect the format.

Returns:

str – The format, if supported. Else raises an Exception.

clisops.utils.dataset_utils.detect_gridtype(ds, lon, lat, lon_bnds=None, lat_bnds=None)[source]

Detect type of the grid as one of “regular_lat_lon”, “curvilinear”, “unstructured”.

Assumes the grid description / structure follows the CF conventions.

clisops.utils.dataset_utils.detect_shape(ds, lat, lon, grid_type) Tuple[int, int, int][source]

Detect the shape of the grid.

Returns a tuple of (nlat, nlon, ncells). For an unstructured grid nlat and nlon are not defined and therefore the returned tuple will be (ncells, ncells, ncells).

Parameters:
  • ds (xr.Dataset) – Dataset containing the grid / coordinate variables.

  • lat (str) – Latitude variable name.

  • lon (str) – Longitude variable name.

  • grid_type (str) – One of “regular_lat_lon”, “curvilinear”, “unstructured”

Returns:

  • int – Number of latitude points in the grid.

  • int – Number of longitude points in the grid.

  • int – Number of cells in the grid.

clisops.utils.dataset_utils.generate_bounds_curvilinear(ds, lat, lon)[source]

Compute bounds for curvilinear grids.

Assumes 2D latitude and longitude coordinate variables. The bounds will be attached as coords to the xarray.Dataset of the Grid object. If no bounds can be created, a warning is issued. It is assumed but not ensured that no duplicated cells are present in the grid.

The bound calculation for curvilinear grids was adapted from https://github.com/SantanderMetGroup/ATLAS/blob/mai-devel/scripts/ATLAS-data/ bash-interpolation-scripts/AtlasCDOremappeR_CORDEX/grid_bounds_calc.py which based on work by Caillaud Cécile and Samuel Somot from Meteo-France.

Parameters:
  • ds (xarray.Dataset) – Dataset to generate the bounds for.

  • lat (str) – Latitude variable name.

  • lon (str) – Longitude variable name.

Returns:

xarray.Dataset – Dataset with attached bounds.

clisops.utils.dataset_utils.generate_bounds_rectilinear(ds, lat, lon)[source]

Compute bounds for rectilinear grids.

The bounds will be attached as coords to the xarray.Dataset of the Grid object. If no bounds can be created, a warning is issued. It is assumed but not ensured that no duplicated cells are present in the grid.

Parameters:
  • ds (xarray.Dataset) – .

  • lat (str) – Latitude variable name.

  • lon (str) – Longitude variable name.

Returns:

xarray.Dataset – Dataset with attached bounds.

clisops.utils.dataset_utils.reformat_SCRIP_to_CF(ds, keep_attrs=False)[source]

Reformat dataset from SCRIP to CF format.

Parameters:
  • ds (xarray.Dataset) – Input dataset in SCRIP format.

  • keep_attrs (bool) – Whether to keep the global attributes.

Returns:

ds_ref (xarray.Dataset) – Reformatted dataset.

clisops.utils.dataset_utils.reformat_xESMF_to_CF(ds, keep_attrs=False)[source]

Reformat dataset from xESMF to CF format.

Parameters:
  • ds (xarray.Dataset) – Input dataset in xESMF format.

  • keep_attrs (bool) – Whether to keep the global attributes.

Returns:

ds_ref (xarray.Dataset) – Reformatted dataset.

Time Utilities

clisops.utils.time_utils.create_time_bounds(ds, freq)[source]

Generate time bounds for datasets that have been temporally averaged.

Averaging frequencies supported are yearly, monthly and daily.