Subsetting
The subset operation makes use of clisops.core.subset
to process the datasets and to set the output type and the output file names.
[1]:
# Initialize the testing data
import clisops.utils.testing as clite
Stratus = clite.stratus(repo=clite.XCLIM_TEST_DATA_REPO_URL, branch=clite.XCLIM_TEST_DATA_VERSION, cache_dir=clite.XCLIM_TEST_DATA_CACHE_DIR)
# fetch files locally or from GitHub
tas_files = [
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_208012-209912.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_209912-212411.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_212412-214911.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_214912-217411.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_217412-219911.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_219912-222411.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_222412-224911.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_224912-227411.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_227412-229911.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_229912-229912.nc"
]
for i, name in enumerate(tas_files):
tas_files[i] = Stratus.fetch(name)
o3_file = Stratus.fetch("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")
# remove previously created example file
import os
if os.path.exists("./output_001.nc"):
os.remove("./output_001.nc")
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_208012-209912.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_208012-209912.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_209912-212411.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_209912-212411.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_212412-214911.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_212412-214911.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_214912-217411.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_214912-217411.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_217412-219911.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_217412-219911.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_219912-222411.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_219912-222411.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_222412-224911.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_222412-224911.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_224912-227411.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_224912-227411.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_227412-229911.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_227412-229911.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_229912-229912.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_229912-229912.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
[2]:
from clisops.ops.subset import subset
import xarray as xr
The subset
process takes several parameters:
Subsetting Parameters
ds: Union[xr.Dataset, str, Path]
time: Optional[Union[str, TimeParameter]]
area: Optional[
Union[
str,
Tuple[
Union[int, float, str],
Union[int, float, str],
Union[int, float, str],
Union[int, float, str],
],
AreaParameter,
]
]
level: Optional[
Union[
str, LevelParameter
]
]
time_components: Optional[Union[str, Dict, TimeComponentsParameter]]
output_dir: Optional[Union[str, Path]]
output_type: {"netcdf", "nc", "zarr", "xarray"}
split_method: {"time:auto"}
file_namer: {"standard"}
The output is a list containing the outputs in the format selected.
[3]:
ds = xr.open_mfdataset(tas_files, decode_times=xr.coders.CFDatetimeCoder(use_cftime=True), combine="by_coords")
Output to xarray
There will only be one output for this example.
[4]:
outputs = subset(
ds=ds,
time="2007-01-01T00:00:00/2200-12-30T00:00:00",
area=(0.0, 10.0, 175.0, 90.0),
output_type="xarray",
)
print(f"There is only {len(outputs)} output.")
outputs[0]
There is only 1 output.
[4]:
<xarray.Dataset> Size: 140kB Dimensions: (time: 2329, lat: 1, bnds: 2, lon: 1) Coordinates: height float64 8B 1.5 * lat (lat) float64 8B 35.0 * lon (lon) float64 8B 0.0 * time (time) object 19kB 2007-01-16 00:00:00 ... 2200-12-16 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 37kB dask.array<chunksize=(287, 1, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 37kB dask.array<chunksize=(287, 1, 2), meta=np.ndarray> tas (time, lat, lon) float32 9kB dask.array<chunksize=(287, 1, 1), meta=np.ndarray> time_bnds (time, bnds) object 37kB dask.array<chunksize=(287, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
Output to netCDF with simple namer
There is only one output as the file size is under the memory limit so does not need to be split. This example uses the simple namer which numbers output files.
[5]:
outputs = subset(
ds=ds,
time="2007-01-01T00:00:00/2200-12-30T00:00:00",
area=(0.0, 10.0, 175.0, 90.0),
output_type="nc",
output_dir=".",
split_method="time:auto",
file_namer="simple"
)
[6]:
# To open the file
subset_ds = xr.open_mfdataset("./output_001.nc", decode_times=xr.coders.CFDatetimeCoder(use_cftime=True), combine="by_coords")
subset_ds
[6]:
<xarray.Dataset> Size: 140kB Dimensions: (time: 2329, lat: 1, bnds: 2, lon: 1) Coordinates: height float64 8B ... * lat (lat) float64 8B 35.0 * lon (lon) float64 8B 0.0 * time (time) object 19kB 2007-01-16 00:00:00 ... 2200-12-16 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 37kB dask.array<chunksize=(2329, 1, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 37kB dask.array<chunksize=(2329, 1, 2), meta=np.ndarray> tas (time, lat, lon) float32 9kB dask.array<chunksize=(2329, 1, 1), meta=np.ndarray> time_bnds (time, bnds) object 37kB dask.array<chunksize=(2329, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
Output to netCDF with standard namer
There is only one output as the file size is under the memory limit so does not need to be split. This example uses the standard namer which names output filesa ccording the the input file and how it has been subsetted.
[7]:
outputs = subset(
ds=ds,
time="2007-01-01T00:00:00/2200-12-30T00:00:00",
area=(0.0, 10.0, 175.0, 90.0),
output_type="nc",
output_dir=".",
split_method="time:auto",
file_namer="standard"
)
Subsetting by level
[8]:
ds = xr.open_dataset(o3_file, decode_times=xr.coders.CFDatetimeCoder(use_cftime=True))
No subsetting applied
[9]:
result = subset(ds=ds,
output_type="xarray")
result[0].coords
[9]:
Coordinates:
* lat (lat) float64 16B -89.5 10.5
* lon (lon) float64 24B 0.625 125.6 250.6
* plev (plev) float64 152B 1e+05 9.25e+04 8.5e+04 ... 1e+03 500.0 100.0
* time (time) object 10kB 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
Subsetting over level
[10]:
# subsetting over pressure level (plev)
result = subset(ds=ds,
level="600/100",
output_type="xarray")
print(result[0].coords)
print(f"\nplev has been subsetted and now only has {len(result[0].coords)} values.")
Coordinates:
* lat (lat) float64 16B -89.5 10.5
* lon (lon) float64 24B 0.625 125.6 250.6
* plev (plev) float64 16B 500.0 100.0
* time (time) object 10kB 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
plev has been subsetted and now only has 4 values.
Use time components
[11]:
ds = xr.open_mfdataset(tas_files, decode_times=xr.coders.CFDatetimeCoder(use_cftime=True), combine="by_coords")
[12]:
outputs = subset(
ds=ds,
time_components="year: 2010, 2020, 2030|month: 12, 1, 2",
output_type="xarray",
)
print(f"There is only {len(outputs)} output.")
outputs[0]
There is only 1 output.
[12]:
<xarray.Dataset> Size: 976B Dimensions: (time: 9, lat: 2, bnds: 2, lon: 2) Coordinates: height float64 8B 1.5 * lat (lat) float64 16B -90.0 35.0 * lon (lon) float64 16B 0.0 187.5 * time (time) object 72B 2010-01-16 00:00:00 ... 2030-12-16 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 288B dask.array<chunksize=(9, 2, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 288B dask.array<chunksize=(9, 2, 2), meta=np.ndarray> tas (time, lat, lon) float32 144B dask.array<chunksize=(9, 2, 2), meta=np.ndarray> time_bnds (time, bnds) object 144B dask.array<chunksize=(9, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
Using parameter classes
[13]:
from clisops.parameter import (
time_components,
time_interval,
)
[14]:
ds = xr.open_mfdataset(tas_files, decode_times=xr.coders.CFDatetimeCoder(use_cftime=True), combine="by_coords")
[15]:
outputs = subset(
ds=ds,
time=time_interval("2007-01-01T00:00:00", "2200-12-30T00:00:00"),
time_components=time_components(month=["dec", "jan", "feb"]),
output_type="xarray",
)
print(f"There is only {len(outputs)} output.")
outputs[0]
There is only 1 output.
[15]:
<xarray.Dataset> Size: 61kB Dimensions: (time: 583, lat: 2, bnds: 2, lon: 2) Coordinates: height float64 8B 1.5 * lat (lat) float64 16B -90.0 35.0 * lon (lon) float64 16B 0.0 187.5 * time (time) object 5kB 2007-01-16 00:00:00 ... 2200-12-16 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 19kB dask.array<chunksize=(258, 2, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 19kB dask.array<chunksize=(258, 2, 2), meta=np.ndarray> tas (time, lat, lon) float32 9kB dask.array<chunksize=(258, 2, 2), meta=np.ndarray> time_bnds (time, bnds) object 9kB dask.array<chunksize=(258, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3