Averaging over dimensions of the dataset

The average over dimensions operation makes use of clisops.core.average to process the datasets and to set the output type and the output file names.

It is possible to average over none or any number of time, longitude, latitude or level dimensions in the dataset.

[1]:
from clisops.utils import get_file
# fetch files locally or from github
tas_files = get_file([
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc",
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc",
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc",
])

o3_file = get_file("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")

# remove previously created example file
import os
if os.path.exists("./output_001.nc"):
    os.remove("./output_001.nc")

Parameters

Parameters taken by the average_over_dims are below:

ds: Union[xr.Dataset, str]
dims : Optional[Union[Tuple[str], DimensionParameter]]
  The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions
  must be one of ["time", "level", "latitude", "longitude"].
ignore_undetected_dims: bool
  If the dimensions specified are not found in the dataset, an Exception will be raised if set to True.
  If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
output_dir: Optional[Union[str, Path]] = None
output_type: {"netcdf", "nc", "zarr", "xarray"}
split_method: {"time:auto"}
file_namer: {"standard", "simple"}

The output is a list containing the outputs in the format selected.

[2]:
from clisops.ops.average import average_over_dims
from roocs_utils.exceptions import InvalidParameterValue
import xarray as xr
[3]:
ds = xr.open_mfdataset(tas_files, use_cftime=True, combine="by_coords")

ds
[3]:
<xarray.Dataset>
Dimensions:    (lat: 2, time: 900, bnds: 2, lon: 2)
Coordinates:
    height     float64 1.5
  * lat        (lat) float64 -90.0 35.0
  * lon        (lon) float64 0.0 187.5
  * time       (time) object 2005-12-16 00:00:00 ... 2080-11-16 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(300, 2, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(300, 2, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 dask.array<chunksize=(300, 2, 2), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray>
Attributes: (12/29)
    institution:            Met Office Hadley Centre, Fitzroy Road, Exeter, D...
    institute_id:           MOHC
    experiment_id:          rcp85
    source:                 HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o...
    model_id:               HadGEM2-ES
    forcing:                GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,...
    ...                     ...
    title:                  HadGEM2-ES model output prepared for CMIP5 RCP8.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.5.0
    NCO:                    4.7.3

Average over one dimension

[4]:
result = average_over_dims(ds, dims=["time"], ignore_undetected_dims=False, output_type="xarray")

result[0]

[4]:
<xarray.Dataset>
Dimensions:   (lat: 2, bnds: 2, lon: 2)
Coordinates:
    height    float64 1.5
  * lat       (lat) float64 -90.0 35.0
  * lon       (lon) float64 0.0 187.5
Dimensions without coordinates: bnds
Data variables:
    lat_bnds  (lat, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    lon_bnds  (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    tas       (lat, lon) float32 dask.array<chunksize=(2, 2), meta=np.ndarray>
Attributes: (12/29)
    institution:            Met Office Hadley Centre, Fitzroy Road, Exeter, D...
    institute_id:           MOHC
    experiment_id:          rcp85
    source:                 HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o...
    model_id:               HadGEM2-ES
    forcing:                GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,...
    ...                     ...
    title:                  HadGEM2-ES model output prepared for CMIP5 RCP8.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.5.0
    NCO:                    4.7.3

As you can see in the output dataset, time has been averaged over and has been removed.

Average over two dimensions

Averaging over two dimensions is just as simple as averaging over one. The dimensions to be averaged over should be passed in as a sequence.

[5]:
result = average_over_dims(ds, dims=["time", "latitude"], ignore_undetected_dims=False, output_type="xarray")

result[0]
[5]:
<xarray.Dataset>
Dimensions:   (bnds: 2, lon: 2)
Coordinates:
    height    float64 1.5
  * lon       (lon) float64 0.0 187.5
Dimensions without coordinates: bnds
Data variables:
    lat_bnds  (bnds) float64 dask.array<chunksize=(2,), meta=np.ndarray>
    lon_bnds  (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    tas       (lon) float32 dask.array<chunksize=(2,), meta=np.ndarray>
Attributes: (12/29)
    institution:            Met Office Hadley Centre, Fitzroy Road, Exeter, D...
    institute_id:           MOHC
    experiment_id:          rcp85
    source:                 HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o...
    model_id:               HadGEM2-ES
    forcing:                GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,...
    ...                     ...
    title:                  HadGEM2-ES model output prepared for CMIP5 RCP8.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.5.0
    NCO:                    4.7.3

In this case both the time and latitude dimensions have been removed.

Allowed dimensions

It is only possible to average over longtiude, latitude, level and time. If a different dimension is provided to average over an error will be raised.

[6]:
try:
    average_over_dims(
                ds,
                dims=["incorrect_dim"],
                ignore_undetected_dims=False,
                output_type="xarray",
    )
except InvalidParameterValue as exc:
    print(exc)
Dimensions for averaging must be one of ['time', 'level', 'latitude', 'longitude', 'realization']

Dimensions not found

In the case where a dimension has been selected for averaging but it doesn’t exist in the dataset, there are 2 options.

  1. To raise an exception when the dimension doesn’t exist, set ignore_undetected_dims = False

[7]:
try:
    average_over_dims(
        ds,
        dims=["level", "time"],
        ignore_undetected_dims=False,
        output_type="xarray",
    )
except InvalidParameterValue as exc:
    print(exc)
Requested dimensions were not found in input dataset: {'level'}.
  1. To ignore when the dimension doesn’t exist, and average over any other requested dimensions anyway, set ignore_undetected_dims = True

[8]:
result = average_over_dims(
        ds,
        dims=["level", "time"],
        ignore_undetected_dims=True,
        output_type="xarray",
)
result[0]
[8]:
<xarray.Dataset>
Dimensions:   (lat: 2, bnds: 2, lon: 2)
Coordinates:
    height    float64 1.5
  * lat       (lat) float64 -90.0 35.0
  * lon       (lon) float64 0.0 187.5
Dimensions without coordinates: bnds
Data variables:
    lat_bnds  (lat, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    lon_bnds  (lon, bnds) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
    tas       (lat, lon) float32 dask.array<chunksize=(2, 2), meta=np.ndarray>
Attributes: (12/29)
    institution:            Met Office Hadley Centre, Fitzroy Road, Exeter, D...
    institute_id:           MOHC
    experiment_id:          rcp85
    source:                 HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); o...
    model_id:               HadGEM2-ES
    forcing:                GHG, SA, Oz, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O,...
    ...                     ...
    title:                  HadGEM2-ES model output prepared for CMIP5 RCP8.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.5.0
    NCO:                    4.7.3

In the case above, a level dimension did not exist, but this was ignored and time was averaged over anyway.

No dimensions supplied

If no dimensions are supplied, no averaging will be applied and the original dataset will be returned.

[9]:
result = average_over_dims(
        ds,
        dims=None,
        ignore_undetected_dims=False,
        output_type="xarray"
)

result[0]
---------------------------------------------------------------------------
InvalidParameterValue                     Traceback (most recent call last)
Cell In[9], line 1
----> 1 result = average_over_dims(
      2         ds,
      3         dims=None,
      4         ignore_undetected_dims=False,
      5         output_type="xarray"
      6 )
      8 result[0]

File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/average.py:89, in average_over_dims(ds, dims, ignore_undetected_dims, output_dir, output_type, split_method, file_namer)
     54 """Calculate an average over given dimensions.
     55
     56 Parameters
   (...)
     86
     87 """
     88 op = Average(**locals())
---> 89 return op.process()

File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/base_operation.py:219, in Operation.process(self)
    215 namer = self._get_file_namer()
    217 # Process the xarray Dataset - this will (usually) be lazily evaluated so
    218 # no actual data will be read
--> 219 processed_ds = self._calculate()
    221 # remove fill values from lat/lon/time if required
    222 processed_ds = self._remove_redundant_fill_values(processed_ds)

File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/ops/average.py:36, in Average._calculate(self)
     35 def _calculate(self):
---> 36     avg_ds = average.average_over_dims(
     37         self.ds,
     38         self.params.get("dims", None),
     39         self.params.get("ignore_undetected_dims", None),
     40     )
     42     return avg_ds

File ~/checkouts/readthedocs.org/user_builds/clisops/conda/latest/lib/python3.11/site-packages/clisops/core/average.py:186, in average_over_dims(ds, dims, ignore_undetected_dims)
    152 """Average a DataArray or Dataset over the dimensions specified.
    153
    154 Parameters
   (...)
    182     )
    183 """
    185 if not dims:
--> 186     raise InvalidParameterValue(
    187         "At least one dimension for averaging must be provided"
    188     )
    190 if not set(dims).issubset(set(known_coord_types)):
    191     raise InvalidParameterValue(
    192         f"Dimensions for averaging must be one of {known_coord_types}"
    193     )

InvalidParameterValue: At least one dimension for averaging must be provided

An example of averaging over level

[10]:
print("Original dataset")
print(xr.open_dataset(o3_file, use_cftime=True))

result = average_over_dims(
        o3_file,
        dims=["level"],
        ignore_undetected_dims=False,
        output_type="xarray",
    )


print("Averaged dataset")
result[0]
Original dataset
<xarray.Dataset>
Dimensions:    (lat: 2, bnds: 2, lon: 3, time: 1200, plev: 19)
Coordinates:
  * lat        (lat) float64 -89.5 10.5
  * lon        (lon) float64 0.625 125.6 250.6
  * plev       (plev) float64 1e+05 9.25e+04 8.5e+04 7e+04 ... 1e+03 500.0 100.0
  * time       (time) object 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    o3         (time, plev, lat, lon) float32 ...
    time_bnds  (time, bnds) object ...
Attributes: (12/47)
    external_variables:     areacella
    history:                Fri Oct  2 16:56:04 2020: ncks -d lat,,,100 -d lo...
    table_id:               Amon
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    ...                     ...
    tracking_id:            hdl:21.14100/2601cb41-0071-4ec0-bee6-45045c81dab9
    variable_id:            o3
    variant_info:           N/A
    references:             see further_info_url attribute
    variant_label:          r1i1p1f1
    NCO:                    netCDF Operators version 4.9.2 (Homepage = http:/...
Averaged dataset
[10]:
<xarray.Dataset>
Dimensions:    (lat: 2, lon: 3, time: 1200, bnds: 2)
Coordinates:
  * lat        (lat) float64 -89.5 10.5
  * lon        (lon) float64 0.625 125.6 250.6
  * time       (time) object 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
    o3         (time, lat, lon) float32 1.629e-06 1.63e-06 ... 1.661e-06
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    time_bnds  (time, bnds) object ...
Attributes: (12/47)
    external_variables:     areacella
    history:                Fri Oct  2 16:56:04 2020: ncks -d lat,,,100 -d lo...
    table_id:               Amon
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    ...                     ...
    tracking_id:            hdl:21.14100/2601cb41-0071-4ec0-bee6-45045c81dab9
    variable_id:            o3
    variant_info:           N/A
    references:             see further_info_url attribute
    variant_label:          r1i1p1f1
    NCO:                    netCDF Operators version 4.9.2 (Homepage = http:/...

In the above, the dimension plev has be removed and averaged over