API
Core subset functionality
Subset module.
- clisops.core.subset.create_mask(*, x_dim: DataArray, y_dim: DataArray, poly: GeoDataFrame, wrap_lons: bool = False, check_overlap: bool = False) DataArray [source]
Create a mask with values corresponding to the features in a GeoDataFrame using vectorize methods.
The returned mask’s points have the value of the first geometry of poly they fall in.
- Parameters:
x_dim (xarray.DataArray) – X or longitudinal dimension of xarray object. Can also be given through ds_in.
y_dim (xarray.DataArray) – Y or latitudinal dimension of xarray object. Can also be given through ds_in.
poly (gpd.GeoDataFrame) – A GeoDataFrame used to create the xarray.DataArray mask. If its index doesn’t have an integer dtype, it will be reset to integers, which will be used in the mask.
wrap_lons (bool) – Shift vector longitudes by -180,180 degrees to 0,360 degrees; Default = False
check_overlap (bool) – Perform a check to verify if shapes contain overlapping geometries.
- Returns:
xarray.DataArray
Examples
import geopandas as gpd from clisops.core.subset import create_mask ds = xr.open_dataset(path_to_tasmin_file) polys = gpd.read_file(path_to_multi_shape_file) # Get a mask from all polygons in the shape file mask = create_mask(x_dim=ds.lon, y_dim=ds.lat, poly=polys) ds = ds.assign_coords(regions=mask) # Operations can be applied to each region with `groupby`. Ex: ds = ds.groupby("regions").mean() # Extra step to retrieve the names of those polygons stored in another column (here "id") region_names = xr.DataArray(polys.id, dims=("regions",)) ds = ds.assign_coords(regions_names=region_names)
- clisops.core.subset.create_weight_masks(ds_in: DataArray | Dataset, poly: GeoDataFrame) DataArray [source]
Create weight masks corresponding to the features in a GeoDataFrame using xESMF.
The returned masks values are the fraction of the corresponding polygon’s area that is covered by the grid cell. Summing along the spatial dimension will give 1 for each geometry. Requires xESMF.
- Parameters:
ds_in (Union[xarray.DataArray, xarray.Dataset]) – xarray object containing the grid information, as understood by xESMF. For 2D lat/lon coordinates, the bounds arrays are required,
poly (gpd.GeoDataFrame) – GeoDataFrame used to create the xarray.DataArray mask. One mask will be created for each row in the dataframe. Will be converted to EPSG:4326 if needed.
- Returns:
xarray.DataArray – Has a new geom dimension corresponding to the index of the input GeoDataframe. Non-geometry columns of poly are copied as auxiliary coordinates.
Examples
>>> import geopandas as gpd >>> import xarray as xr >>> from clisops.core.subset import create_weight_masks >>> ds = xr.open_dataset(path_to_tasmin_file) >>> polys = gpd.read_file(path_to_multi_shape_file) # Get a weight mask for each polygon in the shape file >>> mask = create_weight_masks( ... x_dim=ds.lon, y_dim=ds.lat, poly=polys ... )
- clisops.core.subset.distance(da: DataArray | Dataset, *, lon: float | Sequence[float] | DataArray, lat: float | Sequence[float] | DataArray) DataArray | Dataset [source]
Return distance to a point in meters.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
lon (Union[float, Sequence[float], xarray.DataArray]) – Longitude coordinate.
lat (Union[float, Sequence[float], xarray.DataArray]) – Latitude coordinate.
- Returns:
xarray.DataArray – Distance in meters to point.
Examples
import xarray as xr from clisops.core.subset import distance # To get the indices from the closest point, use: da = xr.open_dataset(path_to_pr_file).pr d = distance(da, lon=-75, lat=45) k = d.argmin() i, j, _ = np.unravel_index(k, d.shape)
- clisops.core.subset.shape_bbox_indexer(ds: Dataset, poly: GeoDataFrame | GeoSeries | GeometryArray)[source]
Return a spatial indexer that selects the indices of the grid cells covering the given geometries.
- Parameters:
ds (xr.Dataset) – Input dataset.
poly (gpd.GeoDataFrame, gpd.GeoSeries, pd.array.GeometryArray, or list of shapely geometries.) – Shapes to cover. Can be of type Polygon, MultiPolygon, Point, or MultiPoint.
- Returns:
dict – xarray indexer along native dataset coordinates, to be used as an argument to isel.
Examples
>>> indexer = shape_bbox_indexer(ds, poly) >>> ds.isel(indexer)
Notes
This is used in particular to restrict the domain of a dataset before computing the weights for a spatial average.
- clisops.core.subset.subset_bbox(da: DataArray | Dataset, lon_bnds: array | tuple[float | None, float | None] = None, lat_bnds: array | tuple[float | None, float | None] = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None, time_values: Sequence[str] | None = None, level_values: Sequence[float] | Sequence[int] | None = None) DataArray | Dataset [source]
Subset a DataArray or Dataset spatially (and temporally) using a lat lon bounding box and date selection.
Return a subset of a DataArray or Dataset for grid points falling within a spatial bounding box defined by longitude and latitudinal bounds and for dates falling within provided bounds.
TODO: returns the what? In the case of a lat-lon rectilinear grid, this simply returns the
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
lon_bnds (Union[np.array, Tuple[Optional[float], Optional[float]]]) – List of minimum and maximum longitudinal bounds. Optional. Defaults to all longitudes in original data-array.
lat_bnds (Union[np.array, Tuple[Optional[float], Optional[float]]]) – List of minimum and maximum latitudinal bounds. Optional. Defaults to all latitudes in original data-array.
start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.
time_values (Optional[Sequence[str]]) – A list of datetime strings to subset.
level_values (Optional[Union[Sequence[float], Sequence[int]]]) – A list of level values to select.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Notes
subset_bbox expects the lower and upper bounds to be provided in ascending order. If the actual coordinate values are descending then this will be detected and your selection reversed before the data subset is returned.
Examples
import xarray as xr from clisops.core.subset import subset_bbox ds = xr.open_dataset(path_to_pr_file) # Subset lat lon prSub = subset_bbox(ds.pr, lon_bnds=[-75, -70], lat_bnds=[40, 45])
- clisops.core.subset.subset_gridpoint(da: DataArray | Dataset, lon: float | Sequence[float] | DataArray | None = None, lat: float | Sequence[float] | DataArray | None = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None, tolerance: float | None = None, add_distance: bool = False) DataArray | Dataset [source]
Extract one or more of the nearest gridpoint(s) from datarray based on lat lon coordinate(s).
Return a subsetted data array (or Dataset) for the grid point(s) falling nearest the input longitude and latitude coordinates. Optionally subset the data array for years falling within provided date bounds. Time series can optionally be subsetted by dates. If 1D sequences of coordinates are given, the gridpoints will be concatenated along the new dimension “site”.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
lon (Optional[Union[float, Sequence[float], xarray.DataArray]]) – Longitude coordinate(s). Must be of the same length as lat.
lat (Optional[Union[float, Sequence[float], xarray.DataArray]]) – Latitude coordinate(s). Must be of the same length as lon.
start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.
tolerance (Optional[float]) – Masks values if the distance to the nearest gridpoint is larger than tolerance in meters.
add_distance (bool)
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Examples
import xarray as xr from clisops.core.subset import subset_gridpoint ds = xr.open_dataset(path_to_pr_file) # Subset lat lon point prSub = subset_gridpoint(ds.pr, lon=-75, lat=45) # Subset multiple variables in a single dataset ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file]) dsSub = subset_gridpoint(ds, lon=-75, lat=45)
- clisops.core.subset.subset_level(da: DataArray | Dataset, first_level: int | float | str | None = None, last_level: int | float | str | None = None) DataArray | Dataset [source]
Subset input DataArray or Dataset based on first and last levels.
Return a subset of a DataArray or Dataset for levels falling within the provided bounds.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
first_level (Optional[Union[int, float, str]]) – First level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to first level of input data-array.
last_level (Optional[Union[int, float, str]]) – Last level of the subset (specified as the value, not the index). Can be either an integer or float. Defaults to last level of input data-array.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Examples
import xarray as xr from clisops.core.subset import subset_level ds = xr.open_dataset(path_to_pr_file) # Subset complete levels prSub = subset_level(ds.pr, first_level=0, last_level=30) # Subset single level prSub = subset_level(ds.pr, first_level=1000, last_level=1000) # Subset multiple variables in a single dataset ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file]) dsSub = subset_time(ds, first_level=1000.0, last_level=850.0)
Notes
TBA
- clisops.core.subset.subset_level_by_values(da: DataArray | Dataset, level_values: Sequence[float] | Sequence[int] | None = None) DataArray | Dataset [source]
Subset input DataArray or Dataset based on a sequence of vertical level values.
Return a subset of a DataArray or Dataset for levels matching those requested.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
level_values (Optional[Union[Sequence[float], Sequence[int]]]) – A list of level values to select.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Examples
import xarray as xr from clisops.core.subset import subset_level_by_values ds = xr.open_dataset(path_to_pr_file) # Subset a selection of levels levels = [1000.0, 850.0, 250.0, 100.0] prSub = subset_level_by_values(ds.pr, level_values=levels)
Notes
If any levels are not found, a ValueError will be raised. The requested levels will automatically be re-ordered to match the order in the input dataset.
- clisops.core.subset.subset_shape(ds: DataArray | Dataset, shape: str | Path | GeoDataFrame, raster_crs: str | int | None = None, shape_crs: str | int | None = None, buffer: int | float | None = None, start_date: str | None = None, end_date: str | None = None, first_level: int | float | None = None, last_level: int | float | None = None) DataArray | Dataset [source]
Subset a DataArray or Dataset spatially (and temporally) using a vector shape and date selection.
Return a subset of a DataArray or Dataset for grid points falling within the area of a Polygon and/or MultiPolygon shape, or grid points along the path of a LineString and/or MultiLineString. If the shape consists of several disjoint polygons, the output is cut to the smallest bbox including all polygons.
- Parameters:
ds (Union[xarray.DataArray, xarray.Dataset]) – Input values.
shape (Union[str, Path, gpd.GeoDataFrame]) – Path to a shape file, or GeoDataFrame directly. Supports GeoPandas-compatible formats.
raster_crs (Optional[Union[str, int]]) – EPSG number or PROJ4 string.
shape_crs (Optional[Union[str, int]]) – EPSG number or PROJ4 string.
buffer (Optional[Union[int, float]]) – Buffer the shape in order to select a larger region stemming from it. Units are based on the shape degrees/metres.
start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
first_level (Optional[Union[int, float]]) – First level of the subset. Can be either an integer or float. Defaults to first level of input data-array.
last_level (Optional[Union[int, float]]) – Last level of the subset. Can be either an integer or float. Defaults to last level of input data-array.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – A subset of ds
Notes
If no CRS is found in the shape provided (e.g. RFC-7946 GeoJSON, https://en.wikipedia.org/wiki/GeoJSON), assumes a decimal degree datum (CRS84). Be advised that EPSG:4326 and OGC:CRS84 are not identical as axis order of lat and long differs between the two (for more information, see: https://github.com/OSGeo/gdal/issues/2035).
Examples
import xarray as xr from clisops.core.subset import subset_shape pr = xr.open_dataset(path_to_pr_file).pr # Subset data array by shape prSub = subset_shape(pr, shape=path_to_shape_file) # Subset data array by shape and single year prSub = subset_shape( pr, shape=path_to_shape_file, start_date="1990-01-01", end_date="1990-12-31" ) # Subset multiple variables in a single dataset ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file]) dsSub = subset_shape(ds, shape=path_to_shape_file)
- clisops.core.subset.subset_time(da: DataArray | Dataset, start_date: str | None = None, end_date: str | None = None) DataArray | Dataset [source]
Subset input DataArray or Dataset based on start and end years.
Return a subset of a DataArray or Dataset for dates falling within the provided bounds.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
start_date (Optional[str]) – Start date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to first day of input data-array.
end_date (Optional[str]) – End date of the subset. Date string format – can be year (“%Y”), year-month (“%Y-%m”) or year-month-day(“%Y-%m-%d”). Defaults to last day of input data-array.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Examples
import xarray as xr from clisops.core.subset import subset_time ds = xr.open_dataset(path_to_pr_file) # Subset complete years prSub = subset_time(ds.pr, start_date="1990", end_date="1999") # Subset single complete year prSub = subset_time(ds.pr, start_date="1990", end_date="1990") # Subset multiple variables in a single dataset ds = xr.open_mfdataset([path_to_tasmax_file, path_to_tasmin_file]) dsSub = subset_time(ds, start_date="1990", end_date="1999") # Subset with year-month precision - Example subset 1990-03-01 to 1999-08-31 inclusively txSub = subset_time(ds.tasmax, start_date="1990-03", end_date="1999-08") # Subset with specific start_dates and end_dates tnSub = subset_time(ds.tasmin, start_date="1990-03-13", end_date="1990-08-17")
Notes
TODO add notes about different calendar types. Avoid “%Y-%m-31”. If you want complete month use only “%Y-%m”.
- clisops.core.subset.subset_time_by_components(da: DataArray | Dataset, *, time_components: dict | None = None) DataArray [source]
Subsets by one or more time components (year, month, day etc).
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
time_components (Union[Dict, None] = None)
- Returns:
xarray.DataArray
Examples
import xarray as xr from clisops.core.subset import subset_time_by_components # To select all Winter months (Dec, Jan, Feb) or [12, 1, 2]: da = xr.open_dataset(path_to_file).pr winter_dict = {"month": [12, 1, 2]} res = subset_time_by_components(da, time_components=winter_dict)
- clisops.core.subset.subset_time_by_values(da: DataArray | Dataset, time_values: Sequence[str] | None = None) DataArray | Dataset [source]
Subset input DataArray or Dataset based on a sequence of datetime strings.
Return a subset of a DataArray or Dataset for datetimes matching those requested.
- Parameters:
da (Union[xarray.DataArray, xarray.Dataset]) – Input data.
time_values (Optional[Sequence[str]]) – Values for time. Default:
None
- Returns:
Union[xarray.DataArray, xarray.Dataset] – Subsetted xarray.DataArray or xarray.Dataset
Examples
import xarray as xr from clisops.core.subset import subset_time_by_values ds = xr.open_dataset(path_to_pr_file) # Subset a selection of datetimes times = ["2015-01-01", "2018-12-05", "2021-06-06"] prSub = subset_time_by_values(ds.pr, time_values=times)
Notes
If any datetimes are not found, a ValueError will be raised. The requested datetimes will automatically be re-ordered to match the order in the input dataset.
Core average functionality
Average module.
- clisops.core.average.average_over_dims(ds: DataArray | Dataset, dims: Sequence[str] = None, ignore_undetected_dims: bool = False) DataArray | Dataset [source]
Average a DataArray or Dataset over the dimensions specified.
- Parameters:
ds (Union[xr.DataArray, xr.Dataset]) – Input values.
dims (Sequence[{“time”, “level”, “latitude”, “longitude”}]) – The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions must be one of [“time”, “level”, “latitude”, “longitude”].
ignore_undetected_dims (bool) – If the dimensions specified are not found in the dataset, an Exception will be raised if set to True. If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
- Returns:
Union[xr.DataArray, xr.Dataset] – New Dataset or DataArray object averaged over the indicated dimensions. The indicated dimensions will have been removed.
Examples
from clisops.core.average import average_over_dims pr = xr.open_dataset(path_to_pr_file).pr # Average data array over latitude and longitude prAvg = average_over_dims( pr, dims=["latitude", "longitude"], ignore_undetected_dims=True )
- clisops.core.average.average_shape(ds: Dataset, shape: str | Path | GeoDataFrame, variable: str | Sequence[str] = None) DataArray | Dataset [source]
Average a DataArray or Dataset spatially using vector shapes.
Return a DataArray or Dataset averaged over each Polygon given. Requires xESMF.
- Parameters:
ds (xarray.Dataset) – Input values, coordinate attributes must be CF-compliant.
shape (Union[str, Path, gpd.GeoDataFrame]) – Path to shape file, or directly a GeoDataFrame. Supports formats compatible with geopandas. Will be converted to EPSG:4326 if needed.
variable (Union[str, Sequence[str], None]) – Variables to average. If None, average over all data variables.
- Returns:
Union[xarray.DataArray, xarray.Dataset] – ds spatially-averaged over the polygon(s) in shape. Has a new geom dimension corresponding to the index of the input GeoDataFrame. Non-geometry columns of the GeoDataFrame are copied as auxiliary coordinates.
Notes
The spatial weights are computed with ESMF, which uses corners given in lat/lon format (EPSG:4326), the input dataset ds must provide those. In opposition to subset.subset_shape, the weights computed here take partial overlaps and holes into account.
As xESMF computes the weight masks only once, skipping missing values is not really feasible. Thus, all NaNs propagate when performing the average.
Examples
import xarray as xr # doctest: +SKIP from clisops.core.average import average_shape pr = xr.open_dataset(path_to_pr_file).pr # Average data array over shape prAvg = average_shape(pr, shape=path_to_shape_file) # Average multiple variables in a single dataset ds = xr.open_mfdataset([path_to_tasmin_file, path_to_tasmax_file]) dsAvg = average_shape(ds, shape=path_to_shape_file)
- clisops.core.average.average_time(ds: DataArray | Dataset, freq: str) DataArray | Dataset [source]
Average a DataArray or Dataset over the time frequency specified.
- Parameters:
ds (Union[xr.DataArray, xr.Dataset]) – Input values.
freq (str) – The frequency to average over. One of “month”, “year”.
- Returns:
Union[xr.DataArray, xr.Dataset] – New Dataset or DataArray object averaged over the indicated time frequency.
Examples
from clisops.core.average import average_time pr = xr.open_dataset(path_to_pr_file).pr # Average data array over each month prAvg = average_time(pr, freq="month")
Core regrid functionality
Regrid module.
- class clisops.core.regrid.Grid(ds: Dataset | DataArray | None = None, grid_id: str | None = None, grid_instructor: tuple | float | int | None = None, compute_bounds: bool | None = False, mask: str | None = None)[source]
Bases:
object
Create a Grid object that is suitable to serve as source or target grid of the Weights class.
Pre-processes coordinate variables of input dataset (e.g. create or read dataset from input, reformat, generate bounds, identify duplicated and collapsing cells, determine zonal / east-west extent).
- Parameters:
ds (xr.Dataset or xr.DataArray, optional) – Uses horizontal coordinates of an xarray.Dataset or xarray.DataArray to create a Grid object. The default is None.
grid_id (str, optional) – Create the Grid object from a selection of pre-defined grids, e.g. “1deg” or “2pt5deg”. The grids are provided via the roocs_grids package (https://github.com/roocs/roocs-grids). A special setting is “adaptive”/”auto”, which requires the parameter ‘ds’ to be specified as well, and creates a regular lat-lon grid of the same extent and approximate resolution as the grid described by ‘ds’. The default is None.
grid_instructor (tuple, float or int, optional) – Create a regional or global regular lat-lon grid using xESMF utility functions. - Global grid: grid_instructor = (lon_step, lat_step) or grid_instructor = step - Regional grid: grid_instructor = (lon_start, lon_end, lon_step, lat_start, lat_end, lat_step) or grid_instructor = (start, end, step). The default is None.
compute_bounds (bool, optional) – Compute latitude and longitude bounds if the dataset has none defined. The default is False.
mask (str, optional) – Whether to mask “ocean” cells or “land” cells if a mask variable is found. The default is None.
- compare_grid(ds_or_Grid: Dataset | Grid, verbose: bool = False) bool [source]
Compare two Grid objects.
Will compare the checksum of two Grid objects, which depend on the lat and lon coordinate variables, their bounds and if defined, a mask.
- Parameters:
ds_or_Grid (xarray.Dataset or Grid) – Grid that the current Grid object shall be compared to.
verbose (bool) – Whether to also print the result. The default is False.
- Returns:
bool – Returns True if the two Grids are considered identical within the defined precision, else returns False.
- detect_bounds(coordinate: str) str | None [source]
Use cf_xarray to obtain the variable name of the requested coordinates bounds.
- Parameters:
coordinate (str) – Name of the coordinate variable to determine the bounds from.
- Returns:
str, optional – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.
- detect_coordinate(coord_type: str) str [source]
Use cf_xarray to obtain the variable name of the requested coordinate.
- Parameters:
coord_type (str) – Coordinate type, e.g. ‘latitude’, ‘longitude’, ‘level’, ‘time’.
- Raises:
KeyError – Raised if the requested coordinate cannot be identified.
- Returns:
str – Coordinate variable name.
- detect_extent() str [source]
Determine the grid extent in zonal / east-west direction (‘regional’ or ‘global’).
- Returns:
str – ‘regional’ or ‘global’.
- detect_format() str [source]
Detect format of a dataset. Yet supported are ‘CF’, ‘SCRIP’, ‘xESMF’.
- Returns:
str – The format, if supported. Else raises an Exception.
- detect_shape() tuple[int, int, int] [source]
Detect the shape of the grid.
Returns a tuple of (nlat, nlon, ncells). For an unstructured grid nlat and nlon are not defined and therefore the returned tuple will be (ncells, ncells, ncells).
- Returns:
int – Number of latitude points in the grid.
int – Number of longitude points in the grid.
int – Number of cells in the grid.
- detect_type() str [source]
Detect type of the grid as one of “regular_lat_lon”, “curvilinear”, or “unstructured”.
Otherwise, will issue an Exception if grid type is not supported.
- Returns:
str – The detected grid type.
- grid_reformat(grid_format: str, keep_attrs: bool = False)[source]
Reformat the Dataset attached to the Grid object to a target format.
- Parameters:
grid_format (str) – Target format of the reformat operation. Yet supported are ‘SCRIP’, ‘CF’, ‘xESMF’.
keep_attrs (bool) – Whether to keep the global attributes.
- Returns:
ds_ref (xarray.Dataset) – Reformatted dataset.
- static is_smashed_quad2D(coords)[source]
Determine if a quadrilateral (quad) is smashed or degenerate.
- Parameters:
coords (numpy.ndarray) – Array of shape (4, 2) representing the coordinates of the four corners of the quad.
- Returns:
bool – True if the quad is smashed, otherwise False.
- static points_equal(p1, p2, tol=1e-15)[source]
Check if two points are equal within a given tolerance.
- to_netcdf(folder: str | Path | None = './', filename: str | None = '', grid_format: str | None = 'CF', keep_attrs: bool | None = True)[source]
Store a copy of the horizontal Grid as netCDF file on disk.
Define output folder, filename and output format (currently only ‘CF’ is supported). Does not overwrite an existing file.
- Parameters:
folder (str or Path, optional) – Output folder. The default is the current working directory “./”.
filename (str, optional) – Output filename, to be defined separately from folder. The default is ‘grid_<grid.id>.nc’.
grid_format (str, optional) – The format the grid information shall be stored as (in terms of variable attributes and dimensions). The default is “CF”, which is also the only supported output format currently supported.
keep_attrs (bool, optional) – Whether to store the global attributes in the output netCDF file. The default is True.
- class clisops.core.regrid.Weights(grid_in: Grid, grid_out: Grid, method: str, from_disk: str | Path | None = None, format: str | None = None)[source]
Bases:
object
Creates remapping weights out of two Grid objects serving as source and target grid.
Reads weights from cache if possible. Reads weights from disk if specified (not yet implemented). In the latter case, the weight file format has to be supported, to reformat it to xESMF format.
- Parameters:
grid_in (Grid) – Grid object serving as source grid.
grid_out (Grid) – Grid object serving as target grid.
method (str) – Remapping method the weights should be / have been calculated with. One of [“nearest_s2d”, “bilinear”, “conservative”, “patch”] if weights have to be calculated. Free text if weights are read from disk.
from_disk (str, optional) – Not yet implemented. Instead of calculating the regridding weights (or reading them from the cache), read them from disk. The default is None.
format (str, optional) – Not yet implemented. When reading weights from disk, the input format may be specified. If omitted, there will be an attempt to detect the format. The default is None.
- clisops.core.regrid.regrid(grid_in: Grid, grid_out: Grid, weights: Weights, adaptive_masking_threshold: float | None = 0.5, keep_attrs: bool | str = True) Dataset [source]
Perform regridding operation including dealing with dataset and variable attributes.
- Parameters:
grid_in (Grid) – The Grid object of the source grid, e.g. created out of source xarray.Dataset.
grid_out (Grid) – The Grid object of the target grid.
weights (Weights) – The Weights object, as created by using grid_in and grid_out Grid objects as input.
adaptive_masking_threshold (float, optional) – (AMT) A value within the [0., 1.] interval that defines the maximum RATIO of missing_values amongst the total number of data values contributing to the calculation of the target grid cell value. For a fraction [0., AMT[ of the contributing source data missing, the target grid cell will be set to missing_value, else, it will be re-normalized by the factor 1./(1.-RATIO). Thus, if AMT is set to 1, all source grid cells that contribute to a target grid cell must be missing in order for the target grid cell to be defined as missing itself. Values greater than 1 or less than 0 will cause adaptive masking to be turned off. This adaptive masking technique allows to reuse generated weights for differently masked data (e.g. land-sea masks or orographic masks that vary with depth / height). The default is 0.5.
keep_attrs (bool or str) – Sets the global attributes of the resulting dataset, apart from the ones set by this routine: True: attributes of grid_in.ds will be in the resulting dataset. False: no attributes but the ones newly set by this routine “target”: attributes of grid_out.ds will be in the resulting dataset. The default is True.
- Returns:
xarray.Dataset – The regridded data in form of an xarray.Dataset.
- clisops.core.regrid.weights_cache_flush(weights_dir_init: str | Path | None = '', dryrun: bool | None = False, verbose: bool | None = False) None [source]
Flush and reinitialize the local weights cache.
- Parameters:
weights_dir_init (str, optional) – Directory name to reinitialize the local weights cache in. Will be created if it does not exist. The default is CONFIG[“clisops:grid_weights”][“local_weights_dir”] as defined in roocs.ini (or as redefined by a manual weights_cache_init call).
dryrun (bool, optional) – If True, it will only print all files that would get deleted. The default is False.
verbose (bool, optional) – If True, and dryrun is False, will print all files that are getting deleted. The default is False.
- Returns:
None
- clisops.core.regrid.weights_cache_init(weights_dir: str | Path | None = None, config: dict = {'clisops:coordinate_precision': {'hor_coord_decimals': '6', 'vert_coord_decimals': '6'}, 'clisops:grid_weights': {'local_weights_dir': '/home/docs/.local/share/clisops/grid_weights', 'remote_weights_svc': ''}, 'clisops:read': {'chunk_memory_limit': '2048MiB'}, 'clisops:write': {'file_size_limit': '2GB', 'output_staging_dir': ''}, 'config_data_types': {'boolean': 'use_catalog is_default_for_path', 'dicts': 'mappings attr_defaults fixed_path_mappings fixed_path_modifiers', 'extra_booleans': '', 'extra_dicts': '', 'extra_floats': '', 'extra_ints': '', 'extra_lists': '', 'floats': '', 'ints': '', 'lists': 'facet_rule'}, 'environment': {'mkl_num_threads': '1', 'numexpr_num_threads': '1', 'omp_num_threads': '1', 'openblas_num_threads': '1', 'veclib_maximum_threads': '1'}, 'project:c3s-cica-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS_v1/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SCMIP5/data/c3s-cmip5', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip5/', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False}, 'project:c3s-cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'use_catalog': True}, 'project:c3s-cmip6-decadal': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'mip_era'}, 'use_catalog': True}, 'project:c3s-cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SCORDEX/data/c3s-cordex', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cordex/', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True}, 'project:c3s-ipcc-ar6-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:c3s-ipcc-atlas': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': True}, 'project:cmip5': {'attr_defaults': {'experiment': 'no-expt', 'frequency': 'no-freq', 'initialization_method': 'X', 'model_id': 'no-model', 'physics_version': 'X', 'realization': 'X'}, 'base_dir': '/mnt/lustre/work/kd0956/CMIP5/data/cmip5', 'facet_rule': ['activity', 'product', 'institute', 'model', 'experiment', 'frequency', 'realm', 'mip_table', 'ensemble_member', 'version', 'variable'], 'file_name_template': '{__derive__var_id}_{frequency}_{model_id}_{experiment_id}_r{realization}i{initialization_method}p{physics_version}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:cmip6': {'attr_defaults': {'experiment_id': 'no-expt', 'forcing_index': 'X', 'grid_label': 'no-grid', 'initialization_index': 'X', 'physics_index': 'X', 'realization_index': 'X', 'source_id': 'no-model', 'table_id': 'no-table'}, 'base_dir': '/mnt/lustre/work/ik1017/CMIP6/data/CMIP6', 'data_node_root': 'http://esgf3.dkrz.de/thredds/fileServer/cmip6/', 'facet_rule': ['mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version'], 'file_name_template': '{__derive__var_id}_{table_id}_{source_id}_{experiment_id}_r{realization_index}i{initialization_index}p{physics_index}f{forcing_index}_{grid_label}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': False, 'mappings': {'project': 'mip_era', 'variable': 'variable_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:copernicus -interactive-climate-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/mnt/lustre/work/ik1017/C3SATLAS_v1/data/c3s-cica-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cica-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}, 'project:cordex': {'attr_defaults': {'CORDEX_domain': 'no-domain', 'driving_model_ensemble_member': 'rXiXpX', 'driving_model_id': 'no-driving-model', 'experiment_id': 'no-exp', 'frequency': 'no-freq', 'model_id': 'no-model', 'rcm_version_id': 'no-version'}, 'base_dir': '/mnt/lustre/work/ik1017/CORDEX/data/cordex', 'facet_rule': ['project', 'product', 'domain', 'institute', 'driving_model', 'experiment_id', 'ensemble', 'rcm_name', 'rcm_version', 'time_frequency', 'variable', 'version'], 'file_name_template': '{__derive__var_id}_{CORDEX_domain}_{driving_model_id}_{experiment_id}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'project_id'}, 'use_catalog': False, 'use_inventory': 'False'}, 'project:ipcc-ar6-interactive-atlas-dataset': {'attr_defaults': {'experiment_id': 'no-expt', 'frequency': 'no-freq'}, 'base_dir': '/pool/data/c3s-ipcc-ar6-atlas', 'data_node_root': 'https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-ipcc-atlas/', 'facet_rule': ['variable', 'project', 'experiment', 'time_frequency'], 'file_name_template': '{__derive__var_id}_{source}_{experiment_id}_{frequency}{__derive__time_range}{extra}.{__derive__extension}', 'is_default_for_path': True, 'mappings': {'project': 'product'}, 'use_catalog': True, 'use_inventory': 'True'}}) None [source]
Initialize global variable weights_dir as used by the Weights class.
- Parameters:
weights_dir (str or Path) – Directory name to initialize the local weights cache in. Will be created if it does not exist. Per default, this function is called upon import with weights_dir as defined in roocs.ini.
config (dict) – Configuration dictionary as read from top-level.
- Returns:
None
Subset operation
- class clisops.ops.subset.Subset(ds, file_namer: str = 'standard', split_method: str = 'time:auto', output_dir: str | Path | None = None, output_type: str = 'netcdf', **params)[source]
Bases:
Operation
- clisops.ops.subset.subset(ds: Dataset | str | Path, *, time: str | tuple[str, str] | TimeParameter | Series | Interval | None = None, area: str | tuple[int | float | str, int | float | str, int | float | str, int | float | str] | AreaParameter | None = None, level: str | tuple[int | float | str, int | float | str] | LevelParameter | Interval | None = None, time_components: str | dict | TimeComponents | TimeComponentsParameter | None = None, output_dir: str | Path | None = None, output_type='netcdf', split_method='time:auto', file_namer='standard') list[Dataset | str] [source]
Subset operation.
- Parameters:
ds (Union[xr.Dataset, str])
time (Optional[Union[str, Tuple[str, str], TimeParameter, Series, Interval]] = None,)
area (str or AreaParameter or Tuple[Union[int, float, str], Union[int, float, str], Union[int, float, str], Union[int, float, str]], optional)
level (Optional[Union[str, Tuple[Union[int, float, str], Union[int, float, str]], LevelParameter, Interval] = None,)
time_components (Optional[Union[str, Dict, TimeComponentsParameter]] = None,)
output_dir (Optional[Union[str, Path]] = None)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
- Returns:
List[Union[xr.Dataset, str]] – A list of the subsetted outputs in the format selected; str corresponds to file paths if the output format selected is a file.
Examples
ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”) or “2085-01-01T12:00:00Z/2120-12-30T12:00:00Z”area: (-5.,49.,10.,65) or “0.,49.,10.,65” or [0, 49.5, 10, 65] with the order being lon_0, lat_0, lon_1, lat_1level: (1000.,) or “1000/2000” or (“1000.50”, “2000.60”)time_components: “year:2000,2004,2008|month:01,02” or {“year”: (2000, 2004, 2008), “months”: (1, 2)}output_dir: “/cache/wps/procs/req0111”output_type: “netcdf”split_method: “time:auto”file_namer: “standard”Note
If you request a selection range (such as level, latitude or longitude) that specifies the lower and upper bounds in the opposite direction to the actual coordinate values then clisops.ops.subset will detect this issue and reverse your selection before returning the data subset.
Average operations
- clisops.ops.average.average_over_dims(ds: Dataset | str, dims: Sequence[str] | DimensionParameter | None = None, ignore_undetected_dims: bool = False, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') list[Dataset | str] [source]
Calculate an average over given dimensions.
- Parameters:
ds (Union[xr.Dataset, str]) – Xarray dataset.
dims (Optional[Union[Sequence[{“time”, “level”, “latitude”, “longitude”}], DimensionParameter]]) – The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions must be one of [“time”, “level”, “latitude”, “longitude”].
ignore_undetected_dims (bool) – If the dimensions specified are not found in the dataset, an Exception will be raised if set to True. If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
output_dir (Optional[Union[str, Path]])
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
- Returns:
List[Union[xr.Dataset, str]] – A list of the outputs in the format selected; str corresponds to file paths if the output format selected is a file.
Examples
ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”dims: [‘latitude’, ‘longitude’]ignore_undetected_dims: Falseoutput_dir: “/cache/wps/procs/req0111”output_type: “netcdf”split_method: “time:auto”file_namer: “standard”
- clisops.ops.average.average_shape(ds: Dataset | Path | str, shape: str | Path | GeoDataFrame, variable: str | Sequence[str] | None = None, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') list[Dataset | str] [source]
Calculate a spatial average over a given shape.
- Parameters:
ds (Union[xr.Dataset, str]) – Xarray dataset.
shape (Union[str, Path, gpd.GeoDataFrame]) – Path to shape file, or directly a GeoDataFrame. Supports formats compatible with geopandas. Will be converted to EPSG:4326 if needed.
variable (Optional[Union[str, Sequence[str], None]]) – Variables to average. If None, average over all data variables.
output_dir (Optional[Union[str, Path]])
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
- Returns:
List[Union[xr.Dataset, str]] – A list of the outputs in the format selected; str corresponds to file paths if the output format selected is a file.
Examples
ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”dims: [‘latitude’, ‘longitude’]ignore_undetected_dims: Falseoutput_dir: “/cache/wps/procs/req0111”output_type: “netcdf”split_method: “time:auto”file_namer: “standard”
- clisops.ops.average.average_time(ds: Dataset | str, freq: str, output_dir: str | Path | None = None, output_type: str = 'netcdf', split_method: str = 'time:auto', file_namer: str = 'standard') list[Dataset | str] [source]
- Parameters:
ds (Union[xr.Dataset, str]) – Xarray dataset.
freq (str) – The frequency to average over. One of “month”, “year”.
output_dir (Optional[Union[str, Path]])
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
- Returns:
List[Union[xr.Dataset, str]]
A list of the outputs in the format selected; str corresponds to file paths if the
output format selected is a file.
Examples
ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”dims: [‘latitude’, ‘longitude’]ignore_undetected_dims: Falseoutput_dir: “/cache/wps/procs/req0111”output_type: “netcdf”split_method: “time:auto”file_namer: “standard”
Regrid operation
- clisops.ops.regrid.regrid(ds: Dataset | str | Path, *, method: str | None = 'nearest_s2d', adaptive_masking_threshold: int | float | None = 0.5, grid: Dataset | DataArray | int | float | tuple | str | None = 'adaptive', mask: str | None = None, output_dir: str | Path | None = None, output_type: str | None = 'netcdf', split_method: str | None = 'time:auto', file_namer: str | None = 'standard', keep_attrs: bool | str | None = True) list[Dataset | str] [source]
Regrid specified input file or xarray object.
- Parameters:
ds (Union[xr.Dataset, str])
method ({“nearest_s2d”, “conservative”, “patch”, “bilinear”})
adaptive_masking_threshold (Optional[Union[int, float]])
grid (Union[xr.Dataset, xr.DataArray, int, float, tuple, str])
mask ({“ocean”, “land”} = None)
output_dir (Optional[Union[str, Path]] = None)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
keep_attrs ({True, False, “target”})
- Returns:
List[Union[xr.Dataset, str]] – A list of the regridded outputs in the format selected; str corresponds to file paths if the output format selected is a file.
Examples
ds: xarray Dataset or “cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.tas”method: “nearest_s2d”adaptive_masking_threshold:grid: “1deg”mask: “land”output_dir: “/cache/wps/procs/req0111”output_type: “netcdf”split_method: “time:auto”file_namer: “standard”keep_attrs: True
Common functions
- clisops.utils.common.check_dir(func: LambdaType, dr: str | Path)[source]
Ensure that directory dr exists before function/method is called, decorator.
- clisops.utils.common.expand_wildcards(paths: str | Path) list [source]
Expand the wildcards that may be present in Paths.
Output Utilities
- class clisops.utils.output_utils.FileLock(fpath)[source]
Bases:
object
Create and release a lockfile.
Adapted from https://github.com/cedadev/cmip6-object-store/cmip6_zarr/file_lock.py
- clisops.utils.output_utils.create_lock(fname: str | Path)[source]
Check whether lockfile already exists and else creates lockfile.
- Parameters:
fname (str) – Path of the lockfile to be created.
- Returns:
FileLock object or None.
- clisops.utils.output_utils.filter_times_within(times, start=None, end=None)[source]
Takes an array of datetimes, returning a reduced array if start or end times are defined and are within the main array.
- clisops.utils.output_utils.get_chunk_length(da)[source]
Calculate the chunk length to use when chunking xarray datasets.
Based on memory limit provided in config and the size of the dataset.
- clisops.utils.output_utils.get_da(ds)[source]
Returns xr.DataArray when format of ds may be either xr.Dataset or xr.DataArray.
- clisops.utils.output_utils.get_format_extension(fmt)[source]
Finds the extension for the requested output format.
- clisops.utils.output_utils.get_format_writer(fmt)[source]
Finds the output method for the requested output format.
- clisops.utils.output_utils.get_output(ds, output_type, output_dir, namer)[source]
Return output after applying chunking and determining the output format and chunking.
- clisops.utils.output_utils.get_time_slices(ds: Dataset | DataArray, split_method, start=None, end=None, file_size_limit: str = None) list[tuple[str, str]] [source]
Take an xarray Dataset or DataArray, assume it can be split on the time axis into a sequence of slices. Optionally, take a start and end date to specify a sub-slice of the main time axis.
Use the prescribed file size limit to generate a list of (“YYYY-MM-DD”, “YYYY-MM-DD”) slices so that the output files do not (significantly) exceed the file size limit.
- Parameters:
ds (Union[xr.Dataset, xr.DataArray])
split_method
start
end
file_size_limit (str) – a string specifying “<number><units>”.
- Returns:
List[Tuple[str, str]]
File Utilities
- class clisops.utils.file_utils.FileMapper(file_list, dirpath=None)[source]
Bases:
object
Class to represent a set of files that exist in the same directory as one object.
- Parameters:
file_list – the list of files to represent. If dirpath not providedm these should be full file paths.
dirpath – The directory path where the files exist. Default is None.
If dirpath is not provided it will be deduced from the file paths provided in file_list.
- Variables:
file_list – list of file names of the files represented.
file_paths – list of full file paths of the files represented.
dirpath – The directory path where the files exist. Either deduced or provided.
File Namers
- class clisops.utils.file_namers.SimpleFileNamer(replace=None, extra='')[source]
Bases:
_BaseFileNamer
Simple file namer class.
Generates numbered file names.
- class clisops.utils.file_namers.StandardFileNamer(replace=None, extra='')[source]
Bases:
SimpleFileNamer
Standard file namer class.
Generates file names based on input dataset.
Dataset Utilities
- clisops.utils.dataset_utils.add_hor_CF_coord_attrs(ds, lat='lat', lon='lon', lat_bnds='lat_bnds', lon_bnds='lon_bnds', keep_attrs=False)[source]
Add the common CF variable attributes to the horizontal coordinate variables.
- Parameters:
lat (str, optional) – Latitude coordinate variable name. The default is “lat”.
lon (str, optional) – Longitude coordinate variable name. The default is “lon”.
lat_bnds (str, optional) – Latitude bounds coordinate variable name. The default is “lat_bnds”.
lon_bnds (str, optional) – Longitude bounds coordinate variable name. The default is “lon_bnds”.
keep_attrs (bool, optional) – Whether to keep original coordinate variable attributes if they do not conflict. In case of a conflict, the attribute value will be overwritten independent of this setting. The default is False.
- Returns:
xarray.Dataset – The input dataset with updated coordinate variable attributes.
- clisops.utils.dataset_utils.adjust_date_to_calendar(da, date, direction='backwards')[source]
Check that the date specified exists in the calendar type of the dataset.
If not present, changes the date a day at a time (up to a maximum of 5 times) to find a date that does exist. The direction to change the date by is indicated by ‘direction’.
- Parameters:
da (xarray.Dataset or xarray.DataArray) – The data to examine.
date (str) – The date to check.
direction (str) – The direction to move in days to find a date that does exist. ‘backwards’ means the search will go backwards in time until an existing date is found. ‘forwards’ means the search will go forwards in time. The default is ‘backwards’.
- Returns:
str – The next possible existing date in the calendar of the dataset.
- clisops.utils.dataset_utils.calculate_offset(lon, first_element_value)[source]
Calculate the number of elements to roll the dataset by in order to have longitude from within requested bounds.
- Parameters:
lon – Longitude coordinate of xarray dataset.
first_element_value – The value of the first element of the longitude array to roll to.
- clisops.utils.dataset_utils.cf_convert_between_lon_frames(ds_in, lon_interval, force=False)[source]
Convert ds or lon_interval (whichever deems appropriate) to the other longitude frame, if the longitude frames do not match.
If ds and lon_interval are defined on different longitude frames ([-180, 180] and [0, 360]), this function will convert one of the input parameters to the other longitude frame, preferably the lon_interval. Adjusts shifted longitude frames [0-x, 360-x] in the dataset to one of the two standard longitude frames, dependent on the specified lon_interval. In case of curvilinear grids featuring an additional 1D x-coordinate of the projection, this projection x-coordinate will not get converted.
- Parameters:
ds_in (xarray.Dataset or xarray.DataArray) – xarray data object with defined longitude dimension.
lon_interval (tuple or list) – length-2-tuple or -list of floats or integers denoting the bounds of the longitude interval.
force (bool) – If True, force conversion even if longitude frames match.
- Returns:
Tuple(ds, lon_low, lon_high) – The xarray.Dataset and the bounds of the longitude interval, potentially adjusted in terms of their defined longitude frame.
- clisops.utils.dataset_utils.check_lon_alignment(ds, lon_bnds)[source]
Check whether the longitude subset requested is within the bounds of the dataset.
If not try to roll the dataset so that the request is. Raise an exception if rolling is not possible.
- clisops.utils.dataset_utils.convert_coord_to_axis(coord)[source]
Converts coordinate type to its single character axis identifier (tzyx).
- Parameters:
coord – (str) The coordinate to convert.
- Returns:
(str) The single character axis identifier of the coordinate (tzyx).
- clisops.utils.dataset_utils.detect_bounds(ds, coordinate) str | None [source]
Use cf_xarray to obtain the variable name of the requested coordinates bounds.
- Parameters:
ds (xarray.Dataset, xarray.DataArray) – Dataset the coordinate bounds variable name shall be obtained from.
coordinate (str) – Name of the coordinate variable to determine the bounds from.
- Returns:
str or None – Returns the variable name of the requested coordinate bounds. Returns None if the variable has no bounds or if they cannot be identified.
- clisops.utils.dataset_utils.detect_coordinate(ds, coord_type)[source]
Use cf_xarray to obtain the variable name of the requested coordinate.
- Parameters:
ds (xarray.Dataset, xarray.DataArray) – Dataset the coordinate variable name shall be obtained from.
coord_type (str) – Coordinate type understood by cf-xarray, eg. ‘lat’, ‘lon’, …
- Raises:
KeyError – Raised if the requested coordinate cannot be identified.
- Returns:
str – Coordinate variable name.
- clisops.utils.dataset_utils.detect_format(ds)[source]
Detect format of a dataset. Yet supported are ‘CF’, ‘SCRIP’, ‘xESMF’.
- Parameters:
ds (xr.Dataset) – xarray.Dataset of which to detect the format.
- Returns:
str – The format, if supported. Else raises an Exception.
- clisops.utils.dataset_utils.detect_gridtype(ds, lon, lat, lon_bnds=None, lat_bnds=None)[source]
Detect type of the grid as one of “regular_lat_lon”, “curvilinear”, “unstructured”.
Assumes the grid description / structure follows the CF conventions.
- clisops.utils.dataset_utils.detect_shape(ds, lat, lon, grid_type) tuple[int, int, int] [source]
Detect the shape of the grid.
Returns a tuple of (nlat, nlon, ncells). For an unstructured grid nlat and nlon are not defined and therefore the returned tuple will be (ncells, ncells, ncells).
- Parameters:
ds (xr.Dataset) – Dataset containing the grid / coordinate variables.
lat (str) – Latitude variable name.
lon (str) – Longitude variable name.
grid_type (str) – One of “regular_lat_lon”, “curvilinear”, “unstructured”
- Returns:
int – Number of latitude points in the grid.
int – Number of longitude points in the grid.
int – Number of cells in the grid.
- clisops.utils.dataset_utils.determine_lon_lat_range(ds, lon, lat, lon_bnds=None, lat_bnds=None, apply_fix=True)[source]
Determine the min/max lon/lat values of the dataset (and potentially apply fix for unmasked missing values).
- Parameters:
ds (xarray.Dataset) – Input dataset.
lon (str) – Name of longitude coordinate.
lat (str) – Name of latitude coordinate.
lon_bnds (str or None, optional) – Name of longitude bounds coordinate. The default is None.
lat_bnds (str or None, optional) – Name of latitude bounds coordinate. The default is None.
apply_fix (bool, optional) – Whether to apply fix for unmasked missing values. The default is True.
- Returns:
xmin (float) – Minimum longitude value.
xmax (float) – Maximum longitude value.
ymin (float) – Minimum latitude value.
ymax (float) – Maximum latitude value.
- clisops.utils.dataset_utils.fix_unmasked_missing_values_lon_lat(ds, lon, lat, lon_bnds, lat_bnds, xminmax, yminmax)[source]
Fix for unmasked missing values in longitude and latitude coordinates and their bounds
- Parameters:
ds (xarray.Dataset) – Input dataset.
lon (str) – Name of longitude coordinate.
lat (str) – Name of latitude coordinate.
lon_bnds (str or None) – Name of longitude bounds coordinate.
lat_bnds (str or None) – Name of latitude bounds coordinate.
xminmax (list) – List of minimum and maximum longitude values.
yminmax (list) – List of minimum and maximum latitude values.
- Returns:
fix (bool) – Whether the fix on ds[lon] and ds[lat] (and if specified ds[lon_bnds] and ds[lat_bnds]) was applied or not.
- clisops.utils.dataset_utils.generate_bounds_curvilinear(ds, lat, lon, clip_latitude=True, roll=True)[source]
Compute bounds for curvilinear grids.
Assumes 2D latitude and longitude coordinate variables. The bounds will be attached as coords to the xarray.Dataset. Assumes the longitudes are defined on the longitude frame [-180, 180]. The default setting for ‘roll’ ensures that the longitudes are converted if this is not the case.
The bound calculation for curvilinear grids was adapted from https://github.com/SantanderMetGroup/ATLAS/blob/mai-devel/scripts/ATLAS-data/ bash-interpolation-scripts/AtlasCDOremappeR_CORDEX/grid_bounds_calc.py which based on work by Caillaud Cécile and Samuel Somot from Meteo-France. Compared with the original code, there is an additional correction performed in the calculation, ensuring that at the Greenwich meridian and anti meridian the sign of the bounds does not differ. The new implementation is also significantly faster, as it replaces for loops with numpy.vectorize and index slicing.
- Parameters:
ds (xarray.Dataset) – Dataset to compute the bounds for.
lon (str) – Longitude variable name.
lat (str) – Latitude variable name.
clip_latitude (bool, optional) – Whether to clip latitude values to [-90, 90]. The default is True.
roll (bool, optional) – Whether to roll longitude values to [-180, 180]. The default is True.
- Returns:
ds (xarray.Dataset) – Dataset with bounds attached variables.
- clisops.utils.dataset_utils.generate_bounds_rectilinear(ds, lat, lon)[source]
Compute bounds for rectilinear grids.
The bounds will be attached as coords to the xarray.Dataset of the Grid object. If no bounds can be created, a warning is issued. It is assumed but not ensured that no duplicated cells are present in the grid.
- Parameters:
ds (xarray.Dataset) – .
lat (str) – Latitude variable name.
lon (str) – Longitude variable name.
- Returns:
xarray.Dataset – Dataset with attached bounds.
- clisops.utils.dataset_utils.get_coord_by_attr(ds, attr, value)[source]
Returns a coordinate based on a known attribute of a coordinate.
- Parameters:
ds – Xarray Dataset or DataArray
attr – (str) Name of attribute to look for.
value – Expected value of attribute you are looking for.
- Returns:
Coordinate of xarray dataset if found.
- clisops.utils.dataset_utils.get_coord_by_type(ds, coord_type, ignore_aux_coords=True, return_further_matches=False)[source]
Returns the name of the coordinate that matches the given type.
- Parameters:
ds (xarray.Dataset or xarray.DataArray) – Dataset/DataArray to search for coordinate.
coord_type (str) – Type of coordinate, e.g. ‘time’, ‘level’, ‘latitude’, ‘longitude’, ‘realization’.
ignore_aux_coords (bool) – Whether to ignore auxiliary coordinates.
return_further_matches (bool) – Whether to return further matches.
- Returns:
str – Name of the coordinate that matches the given type.
str, list of str – If return_further_matches is True, apart from the matching coordinate, a list with further potential matches is returned.
- Raises:
ValueError – If the coordinate type is not known.
- clisops.utils.dataset_utils.get_coord_type(coord)[source]
Gets the coordinate type.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
The type of coordinate as a string. Either longitude, latitude, time, level or None
- clisops.utils.dataset_utils.get_main_variable(ds, exclude_common_coords=True)[source]
Finds the main variable of an xarray Dataset
- Parameters:
ds – xarray Dataset
exclude_common_coords – (bool) If True then common coordinates are excluded from the search for the main variable. common coordinates are time, level, latitude, longitude and bounds. Default is True.
- Returns:
(str) The main variable of the dataset e.g. ‘tas’
- clisops.utils.dataset_utils.is_kerchunk_file(dset)[source]
Returns a boolean based on reading the file extension.
- clisops.utils.dataset_utils.is_latitude(coord)[source]
Determines if a coordinate is latitude.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
(bool) True if the coordinate is latitude.
- clisops.utils.dataset_utils.is_level(coord)[source]
Determines if a coordinate is level.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
(bool) True if the coordinate is level.
- clisops.utils.dataset_utils.is_longitude(coord)[source]
Determines if a coordinate is longitude.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
(bool) True if the coordinate is longitude.
- clisops.utils.dataset_utils.is_realization(coord)[source]
Determines if a coordinate is realization.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
(bool) True if the coordinate is longitude.
- clisops.utils.dataset_utils.is_time(coord)[source]
Determines if a coordinate is time.
- Parameters:
coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
- Returns:
(bool) True if the coordinate is time.
- clisops.utils.dataset_utils.open_xr_dataset(dset, **kwargs)[source]
Opens an xarray dataset from a dataset input.
- Parameters:
dset – (str or Path) A dataset identifier, directory path, or file path ending in
*.nc
.kwargs – Any additional keyword arguments for opening the dataset.
use_cftime=True
anddecode_timedelta=False
are used by default, along withcombine="by_coords"
foropen_mfdataset
only.
Notes
Any list will be interpreted as a list of files.
- clisops.utils.dataset_utils.reformat_SCRIP_to_CF(ds, keep_attrs=False)[source]
Reformat dataset from SCRIP to CF format.
- Parameters:
ds (xarray.Dataset) – Input dataset in SCRIP format.
keep_attrs (bool) – Whether to keep the global attributes.
- Returns:
ds_ref (xarray.Dataset) – Reformatted dataset.
- clisops.utils.dataset_utils.reformat_xESMF_to_CF(ds, keep_attrs=False)[source]
Reformat dataset from xESMF to CF format.
- Parameters:
ds (xarray.Dataset) – Input dataset in xESMF format.
keep_attrs (bool) – Whether to keep the global attributes.
- Returns:
ds_ref (xarray.Dataset) – Reformatted dataset.
Time Utilities
- class clisops.utils.time_utils.AnyCalendarDateTime(year, month, day, hour, minute, second)[source]
Bases:
object
A class to represent a datetime that could be of any calendar.
Has the ability to add and subtract a day from the input based on MAX_DAY, MIN_DAY, MAX_MONTH and MIN_MONTH
- DAY_RANGE = range(1, 32)
- HOUR_RANGE = range(0, 24)
- MINUTE_RANGE = range(0, 60)
- MONTH_RANGE = range(1, 13)
- SECOND_RANGE = range(0, 60)
- property value
- clisops.utils.time_utils.create_time_bounds(ds, freq)[source]
Generate time bounds for datasets that have been temporally averaged.
Averaging frequencies supported are yearly, monthly and daily.
- clisops.utils.time_utils.str_to_AnyCalendarDateTime(dt, defaults=None)[source]
Takes a string representing date/time and returns a DateTimeAnyTime object. String formats should start with Year and go through to Second, but you can miss out anything from month onwards.
- Parameters:
dt – (str) string representing a date/time.
defaults – (list) The default values to use for year, month, day, hour, minute and second if they cannot be parsed from the string. A default value must be provided for each component. If defaults=None, [-1, 1, 1, 0, 0, 0] is used.
- Returns:
AnyCalendarDateTime object
Parameters
- class clisops.parameter.area_parameter.AreaParameter(input)[source]
Bases:
_BaseParameter
Class for area parameter used in subsetting operation.
Area can be input as:A string of comma separated values: “0.,49.,10.,65”A sequence of strings: (“0”, “-10”, “120”, “40”)A sequence of numbers: [0, 49.5, 10, 65]An area must have 4 values.
Validates the area input and parses the values into numbers.
- allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'clisops.parameter._utils.Series'>, <class 'NoneType'>]
- class clisops.parameter.collection_parameter.CollectionParameter(input)[source]
Bases:
_BaseParameter
Class for collection parameter used in operations.
A collection can be input as:A string of comma separated values: “cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga,cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”A sequence of strings: e.g. (“cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”, “cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”)A sequence of clisops.utils.file_utils.FileMapper objectsValidates the input and parses the items.
- allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'clisops.parameter._utils.Series'>, <class 'clisops.utils.file_utils.FileMapper'>]
- class clisops.parameter.level_parameter.LevelParameter(input)[source]
Bases:
_BaseIntervalOrSeriesParameter
Class for level parameter used in subsetting operation.
Level can be input as:A string of slash separated values: “1000/2000”A sequence of strings: e.g. (“1000.50”, “2000.60”)A sequence of numbers: e.g. (1000.50, 2000.60)A level input must be 2 values.
If using a string input a trailing slash indicates you want to use the lowest/highest level of the dataset. e.g. “/2000” will subset from the lowest level in the dataset to 2000.
Validates the level input and parses the values into numbers.
- class clisops.parameter.time_parameter.TimeParameter(input)[source]
Bases:
_BaseIntervalOrSeriesParameter
Class for time parameter used in subsetting operation.
Time can be input as:A string of slash separated values: “2085-01-01T12:00:00Z/2120-12-30T12:00:00Z”A sequence of strings: e.g. (“2085-01-01T12:00:00Z”, “2120-12-30T12:00:00Z”)A time input must be 2 values.
If using a string input a trailing slash indicates you want to use the earliest/ latest time of the dataset. e.g. “2085-01-01T12:00:00Z/” will subset from 01/01/2085 to the final time in the dataset.
Validates the times input and parses the values into isoformat.
- class clisops.parameter.time_components_parameter.TimeComponentsParameter(input)[source]
Bases:
_BaseParameter
Class for time components parameter used in subsetting operation.
- The Time Components are any, or none of:
year: [list of years]
month: [list of months]
day: [list of days]
hour: [list of hours]
minute: [list of minutes]
second: [list of seconds]
- month is special: you can use either strings or values:
“feb”, “mar” == 2, 3 == “02,03”
Validates the times input and parses them into a dictionary.
- allowed_input_types = [<class 'dict'>, <class 'str'>, <class 'clisops.parameter._utils.TimeComponents'>, <class 'NoneType'>]
- class clisops.parameter.dimension_parameter.DimensionParameter(input)[source]
Bases:
_BaseParameter
Class for dimensions parameter used in averaging operation.
Area can be input as:A string of comma separated values: “time,latitude,longitude”A sequence of strings: (“time”, “longitude”)Dimensions can be None or any number of options from time, latitude, longitude and level provided these exist in the dataset being operated on.
Validates the dims input and parses the values into a sequence of strings.
- allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'clisops.parameter._utils.Series'>, <class 'NoneType'>]
- clisops.parameter.parameterise.parameterise(collection=None, area=None, level=None, time=None, time_components=None)[source]
Parameterises inputs to instances of parameter classes which allows them to be used throughout roocs. For supported formats for each input please see their individual classes.
- Parameters:
collection – Collection input in any supported format.
area – Area input in any supported format.
level – Level input in any supported format.
time – Time input in any supported format.
time_components – Time Components input in any supported format.
- Returns:
Parameters as instances of their respective classes.
Project Utils
- class clisops.project_utils.DatasetMapper(dset, project=None, force=False)[source]
Bases:
object
Class to map to data path, dataset ID and files from any dataset input.
- Dset must be a string and can be input as:
A dataset ID: e.g. “cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”.
A file path: e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc”.
A path to a group of files: e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas/*.nc”.
A directory e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas”.
An instance of the FileMapper class (that represents a set of files within a single directory).
When force=True, if the project can not be identified, any attempt to use the base_dir of a project to resolve the data path will be ignored. Any of data_path, ds_id and files that can be set, will be set.
- SUPPORTED_EXTENSIONS = ('.nc', '.gz')
- property base_dir
The base directory of the input dataset.
- property data_path
Dataset input converted to a data path.
- property ds_id
Dataset input converted to a ds id.
- property files
The files found from the input dataset.
- property project
The project of the dataset input.
- property raw
Raw dataset input.
- clisops.project_utils.datapath_to_dsid(datapath)[source]
Switches from dataset path to ds id.
- Parameters:
datapath – dataset path.
- Returns:
dataset id of input dataset path.
- clisops.project_utils.derive_ds_id(dset)[source]
Derives the dataset id of the provided dset.
- Parameters:
dset – dset input of type described by DatasetMapper.
- Returns:
ds id of input dataset.
- clisops.project_utils.derive_dset(dset)[source]
Derives the dataset path of the provided dset.
- Parameters:
dset – dset input of type described by DatasetMapper.
- Returns:
dataset path of input dataset.
- clisops.project_utils.dset_to_filepaths(dset, force=False)[source]
Gets filepaths deduced from input dset.
- Parameters:
dset – dset input of type described by DatasetMapper.
force – When True and if the project of the input dset cannot be identified, DatasetMapper will attempt to find the files anyway. Default is False.
- Returns:
File paths deduced from input dataset.
- clisops.project_utils.dsid_to_datapath(dsid)[source]
Switches from ds id to dataset path.
- Parameters:
dsid – dataset id.
- Returns:
dataset path of input dataset id.
- clisops.project_utils.get_data_node_dirs_dict()[source]
Get a dictionary of the data node roots used for retrieving original files.
- clisops.project_utils.get_project_base_dir(project)[source]
Get the base directory of a project from the config.
- clisops.project_utils.get_project_from_data_node_root(url)[source]
Identify the project from data node root by identifying the data node root in the input url.
- clisops.project_utils.get_project_from_ds(ds)[source]
Gets the project from an xarray Dataset/DataArray.
- Parameters:
ds – xarray Dataset/DataArray.
- Returns:
The project derived from the input dataset.
- clisops.project_utils.get_project_name(dset)[source]
Gets the project from an input dset.
- Parameters:
dset – dset input of type described by DatasetMapper.
- Returns:
The project derived from the input dataset.
- clisops.project_utils.map_facet(facet, project)[source]
Return mapped facet value from config or facet name if not found.