Averaging over dimensions of the dataset
The average over dimensions operation makes use of clisops.core.average
to process the datasets and to set the output type and the output file names.
It is possible to average over none or any number of time, longitude, latitude or level dimensions in the dataset.
[1]:
from clisops.utils.testing import stratus, XCLIM_TEST_DATA_VERSION, XCLIM_TEST_DATA_REPO_URL,XCLIM_TEST_DATA_CACHE_DIR
Stratus = stratus(repo=XCLIM_TEST_DATA_REPO_URL, branch=XCLIM_TEST_DATA_VERSION, cache_dir=XCLIM_TEST_DATA_CACHE_DIR)
# fetch files locally or from GitHub
tas_files = [
Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc"),
Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc"),
Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc"),
]
o3_file = Stratus.fetch("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")
# remove previously created example file
import os
if os.path.exists("./output_001.nc"):
os.remove("./output_001.nc")
Downloading file 'cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc' from 'https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc' to '/home/docs/.cache/xclim-testdata/v2024.8.23'.
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Cell In[1], line 7
3 Stratus = stratus(repo=XCLIM_TEST_DATA_REPO_URL, branch=XCLIM_TEST_DATA_VERSION, cache_dir=XCLIM_TEST_DATA_CACHE_DIR)
5 # fetch files locally or from GitHub
6 tas_files = [
----> 7 Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc"),
8 Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc"),
9 Stratus.fetch("cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc"),
10 ]
12 o3_file = Stratus.fetch("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")
14 # remove previously created example file
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/stable/lib/python3.11/site-packages/pooch/core.py:589, in Pooch.fetch(self, fname, processor, downloader, progressbar)
586 if downloader is None:
587 downloader = choose_downloader(url, progressbar=progressbar)
--> 589 stream_download(
590 url,
591 full_path,
592 known_hash,
593 downloader,
594 pooch=self,
595 retry_if_failed=self.retry_if_failed,
596 )
598 if processor is not None:
599 return processor(str(full_path), action, self)
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/stable/lib/python3.11/site-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
803 try:
804 # Stream the file to a temporary so that we can safely check its
805 # hash before overwriting the original.
806 with temporary_file(path=str(fname.parent)) as tmp:
--> 807 downloader(url, tmp, pooch)
808 hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
809 shutil.move(tmp, str(fname))
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/stable/lib/python3.11/site-packages/pooch/downloaders.py:221, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
219 try:
220 response = requests.get(url, timeout=timeout, **kwargs)
--> 221 response.raise_for_status()
222 content = response.iter_content(chunk_size=self.chunk_size)
223 total = int(response.headers.get("content-length", 0))
File ~/checkouts/readthedocs.org/user_builds/clisops/conda/stable/lib/python3.11/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
1019 http_error_msg = (
1020 f"{self.status_code} Server Error: {reason} for url: {self.url}"
1021 )
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
HTTPError: 403 Client Error: Forbidden for url: https://raw.githubusercontent.com/Ouranosinc/xclim-testdata/v2024.8.23/data/cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc
Parameters
Parameters taken by the average_over_dims
are below:
ds: Union[xr.Dataset, str]
dims : Optional[Union[Tuple[str], DimensionParameter]]
The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions
must be one of ["time", "level", "latitude", "longitude"].
ignore_undetected_dims: bool
If the dimensions specified are not found in the dataset, an Exception will be raised if set to True.
If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
output_dir: Optional[Union[str, Path]] = None
output_type: {"netcdf", "nc", "zarr", "xarray"}
split_method: {"time:auto"}
file_namer: {"standard", "simple"}
The output is a list containing the outputs in the format selected.
[2]:
from clisops.ops.average import average_over_dims
from clisops.exceptions import InvalidParameterValue
import xarray as xr
[3]:
ds = xr.open_mfdataset(tas_files, use_cftime=True, combine="by_coords")
ds
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ds = xr.open_mfdataset(tas_files, use_cftime=True, combine="by_coords")
3 ds
NameError: name 'tas_files' is not defined
Average over one dimension
[4]:
result = average_over_dims(ds, dims=["time"], ignore_undetected_dims=False, output_type="xarray")
result[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 result = average_over_dims(ds, dims=["time"], ignore_undetected_dims=False, output_type="xarray")
3 result[0]
NameError: name 'ds' is not defined
As you can see in the output dataset, time has been averaged over and has been removed.
Average over two dimensions
Averaging over two dimensions is just as simple as averaging over one. The dimensions to be averaged over should be passed in as a sequence.
[5]:
result = average_over_dims(ds, dims=["time", "latitude"], ignore_undetected_dims=False, output_type="xarray")
result[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 result = average_over_dims(ds, dims=["time", "latitude"], ignore_undetected_dims=False, output_type="xarray")
3 result[0]
NameError: name 'ds' is not defined
In this case both the time and latitude dimensions have been removed.
Allowed dimensions
It is only possible to average over longtiude, latitude, level and time. If a different dimension is provided to average over an error will be raised.
[6]:
try:
average_over_dims(
ds,
dims=["incorrect_dim"],
ignore_undetected_dims=False,
output_type="xarray",
)
except InvalidParameterValue as exc:
print(exc)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 3
1 try:
2 average_over_dims(
----> 3 ds,
4 dims=["incorrect_dim"],
5 ignore_undetected_dims=False,
6 output_type="xarray",
7 )
8 except InvalidParameterValue as exc:
9 print(exc)
NameError: name 'ds' is not defined
Dimensions not found
In the case where a dimension has been selected for averaging but it doesn’t exist in the dataset, there are 2 options.
To raise an exception when the dimension doesn’t exist, set
ignore_undetected_dims = False
[7]:
try:
average_over_dims(
ds,
dims=["level", "time"],
ignore_undetected_dims=False,
output_type="xarray",
)
except InvalidParameterValue as exc:
print(exc)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 3
1 try:
2 average_over_dims(
----> 3 ds,
4 dims=["level", "time"],
5 ignore_undetected_dims=False,
6 output_type="xarray",
7 )
8 except InvalidParameterValue as exc:
9 print(exc)
NameError: name 'ds' is not defined
To ignore when the dimension doesn’t exist, and average over any other requested dimensions anyway, set
ignore_undetected_dims = True
[8]:
result = average_over_dims(
ds,
dims=["level", "time"],
ignore_undetected_dims=True,
output_type="xarray",
)
result[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 2
1 result = average_over_dims(
----> 2 ds,
3 dims=["level", "time"],
4 ignore_undetected_dims=True,
5 output_type="xarray",
6 )
7 result[0]
NameError: name 'ds' is not defined
In the case above, a level dimension did not exist, but this was ignored and time was averaged over anyway.
No dimensions supplied
If no dimensions are supplied, no averaging will be applied and the original dataset will be returned.
[9]:
result = average_over_dims(
ds,
dims=None,
ignore_undetected_dims=False,
output_type="xarray"
)
result[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 2
1 result = average_over_dims(
----> 2 ds,
3 dims=None,
4 ignore_undetected_dims=False,
5 output_type="xarray"
6 )
8 result[0]
NameError: name 'ds' is not defined
An example of averaging over level
[10]:
print("Original dataset")
print(xr.open_dataset(o3_file, use_cftime=True))
result = average_over_dims(
o3_file,
dims=["level"],
ignore_undetected_dims=False,
output_type="xarray",
)
print("Averaged dataset")
result[0]
Original dataset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 2
1 print("Original dataset")
----> 2 print(xr.open_dataset(o3_file, use_cftime=True))
4 result = average_over_dims(
5 o3_file,
6 dims=["level"],
7 ignore_undetected_dims=False,
8 output_type="xarray",
9 )
12 print("Averaged dataset")
NameError: name 'o3_file' is not defined
In the above, the dimension plev
has be removed and averaged over