Averaging over dimensions of the dataset
The average over dimensions operation makes use of clisops.core.average
to process the datasets and to set the output type and the output file names.
It is possible to average over none or any number of time, longitude, latitude or level dimensions in the dataset.
[1]:
from clisops.utils import get_file
# fetch files locally or from github
tas_files = get_file([
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc",
"cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc",
])
o3_file = get_file("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")
# remove previously created example file
import os
if os.path.exists("./output_001.nc"):
os.remove("./output_001.nc")
Parameters
Parameters taken by the average_over_dims
are below:
ds: Union[xr.Dataset, str]
dims : Optional[Union[Tuple[str], DimensionParameter]]
The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions
must be one of ["time", "level", "latitude", "longitude"].
ignore_undetected_dims: bool
If the dimensions specified are not found in the dataset, an Exception will be raised if set to True.
If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
output_dir: Optional[Union[str, Path]] = None
output_type: {"netcdf", "nc", "zarr", "xarray"}
split_method: {"time:auto"}
file_namer: {"standard", "simple"}
The output is a list containing the outputs in the format selected.
[2]:
from clisops.ops.average import average_over_dims
from roocs_utils.exceptions import InvalidParameterValue
import xarray as xr
[3]:
ds = xr.open_mfdataset(tas_files, use_cftime=True, combine="by_coords")
ds
[3]:
<xarray.Dataset> Dimensions: (lat: 2, time: 900, bnds: 2, lon: 2) Coordinates: height float64 1.5 * lat (lat) float64 -90.0 35.0 * lon (lon) float64 0.0 187.5 * time (time) object 2005-12-16 00:00:00 ... 2080-11-16 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(300, 2, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(300, 2, 2), meta=np.ndarray> tas (time, lat, lon) float32 dask.array<chunksize=(300, 2, 2), meta=np.ndarray> time_bnds (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
Average over one dimension
[4]:
result = average_over_dims(ds, dims=["time"], ignore_undetected_dims=False, output_type="xarray")
result[0]
[4]:
<xarray.Dataset> Dimensions: (lat: 2, bnds: 2, lon: 2) Coordinates: height float64 1.5 * lat (lat) float64 -90.0 35.0 * lon (lon) float64 0.0 187.5 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> tas (lat, lon) float32 dask.array<chunksize=(2, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
As you can see in the output dataset, time has been averaged over and has been removed.
Average over two dimensions
Averaging over two dimensions is just as simple as averaging over one. The dimensions to be averaged over should be passed in as a sequence.
[5]:
result = average_over_dims(ds, dims=["time", "latitude"], ignore_undetected_dims=False, output_type="xarray")
result[0]
[5]:
<xarray.Dataset> Dimensions: (bnds: 2, lon: 2) Coordinates: height float64 1.5 * lon (lon) float64 0.0 187.5 Dimensions without coordinates: bnds Data variables: lat_bnds (bnds) float64 dask.array<chunksize=(2,), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> tas (lon) float32 dask.array<chunksize=(2,), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
In this case both the time and latitude dimensions have been removed.
Allowed dimensions
It is only possible to average over longtiude, latitude, level and time. If a different dimension is provided to average over an error will be raised.
[6]:
try:
average_over_dims(
ds,
dims=["incorrect_dim"],
ignore_undetected_dims=False,
output_type="xarray",
)
except InvalidParameterValue as exc:
print(exc)
Dimensions for averaging must be one of ['time', 'level', 'latitude', 'longitude', 'realization']
Dimensions not found
In the case where a dimension has been selected for averaging but it doesn’t exist in the dataset, there are 2 options.
To raise an exception when the dimension doesn’t exist, set
ignore_undetected_dims = False
[7]:
try:
average_over_dims(
ds,
dims=["level", "time"],
ignore_undetected_dims=False,
output_type="xarray",
)
except InvalidParameterValue as exc:
print(exc)
Requested dimensions were not found in input dataset: {'level'}.
To ignore when the dimension doesn’t exist, and average over any other requested dimensions anyway, set
ignore_undetected_dims = True
[8]:
result = average_over_dims(
ds,
dims=["level", "time"],
ignore_undetected_dims=True,
output_type="xarray",
)
result[0]
[8]:
<xarray.Dataset> Dimensions: (lat: 2, bnds: 2, lon: 2) Coordinates: height float64 1.5 * lat (lat) float64 -90.0 35.0 * lon (lon) float64 0.0 187.5 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> tas (lat, lon) float32 dask.array<chunksize=(2, 2), meta=np.ndarray> Attributes: (12/29) institution: Met Office Hadley Centre, Fitzroy Road, Exeter, D... institute_id: MOHC experiment_id: rcp85 source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o... model_id: HadGEM2-ES forcing: GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,... ... ... title: HadGEM2-ES model output prepared for CMIP5 RCP8.5 parent_experiment: historical modeling_realm: atmos realization: 1 cmor_version: 2.5.0 NCO: 4.7.3
In the case above, a level dimension did not exist, but this was ignored and time was averaged over anyway.
No dimensions supplied
If no dimensions are supplied, no averaging will be applied and the original dataset will be returned.
[9]:
result = average_over_dims(
ds,
dims=None,
ignore_undetected_dims=False,
output_type="xarray"
)
result[0]
---------------------------------------------------------------------------
InvalidParameterValue Traceback (most recent call last)
Cell In[9], line 1
----> 1 result = average_over_dims(
2 ds,
3 dims=None,
4 ignore_undetected_dims=False,
5 output_type="xarray"
6 )
8 result[0]
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/average.py:89, in average_over_dims(ds, dims, ignore_undetected_dims, output_dir, output_type, split_method, file_namer)
54 """Calculate an average over given dimensions.
55
56 Parameters
(...)
86
87 """
88 op = Average(**locals())
---> 89 return op.process()
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/base_operation.py:219, in Operation.process(self)
215 namer = self._get_file_namer()
217 # Process the xarray Dataset - this will (usually) be lazily evaluated so
218 # no actual data will be read
--> 219 processed_ds = self._calculate()
221 # remove fill values from lat/lon/time if required
222 processed_ds = self._remove_redundant_fill_values(processed_ds)
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/average.py:36, in Average._calculate(self)
35 def _calculate(self):
---> 36 avg_ds = average.average_over_dims(
37 self.ds,
38 self.params.get("dims", None),
39 self.params.get("ignore_undetected_dims", None),
40 )
42 return avg_ds
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/core/average.py:186, in average_over_dims(ds, dims, ignore_undetected_dims)
152 """Average a DataArray or Dataset over the dimensions specified.
153
154 Parameters
(...)
182 )
183 """
185 if not dims:
--> 186 raise InvalidParameterValue(
187 "At least one dimension for averaging must be provided"
188 )
190 if not set(dims).issubset(set(known_coord_types)):
191 raise InvalidParameterValue(
192 f"Dimensions for averaging must be one of {known_coord_types}"
193 )
InvalidParameterValue: At least one dimension for averaging must be provided
An example of averaging over level
[10]:
print("Original dataset")
print(xr.open_dataset(o3_file, use_cftime=True))
result = average_over_dims(
o3_file,
dims=["level"],
ignore_undetected_dims=False,
output_type="xarray",
)
print("Averaged dataset")
result[0]
Original dataset
<xarray.Dataset>
Dimensions: (lat: 2, bnds: 2, lon: 3, time: 1200, plev: 19)
Coordinates:
* lat (lat) float64 -89.5 10.5
* lon (lon) float64 0.625 125.6 250.6
* plev (plev) float64 1e+05 9.25e+04 8.5e+04 7e+04 ... 1e+03 500.0 100.0
* time (time) object 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
o3 (time, plev, lat, lon) float32 ...
time_bnds (time, bnds) object ...
Attributes: (12/47)
external_variables: areacella
history: Fri Oct 2 16:56:04 2020: ncks -d lat,,,100 -d lo...
table_id: Amon
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
... ...
tracking_id: hdl:21.14100/2601cb41-0071-4ec0-bee6-45045c81dab9
variable_id: o3
variant_info: N/A
references: see further_info_url attribute
variant_label: r1i1p1f1
NCO: netCDF Operators version 4.9.2 (Homepage = http:/...
Averaged dataset
[10]:
<xarray.Dataset> Dimensions: (lat: 2, lon: 3, time: 1200, bnds: 2) Coordinates: * lat (lat) float64 -89.5 10.5 * lon (lon) float64 0.625 125.6 250.6 * time (time) object 1850-01-16 12:00:00 ... 1949-12-16 12:00:00 Dimensions without coordinates: bnds Data variables: o3 (time, lat, lon) float32 1.629e-06 1.63e-06 ... 1.661e-06 lat_bnds (lat, bnds) float64 ... lon_bnds (lon, bnds) float64 ... time_bnds (time, bnds) object ... Attributes: (12/47) external_variables: areacella history: Fri Oct 2 16:56:04 2020: ncks -d lat,,,100 -d lo... table_id: Amon activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 ... ... tracking_id: hdl:21.14100/2601cb41-0071-4ec0-bee6-45045c81dab9 variable_id: o3 variant_info: N/A references: see further_info_url attribute variant_label: r1i1p1f1 NCO: netCDF Operators version 4.9.2 (Homepage = http:/...
In the above, the dimension plev
has be removed and averaged over