API

xarrayutils.utils.aggregate(da, blocks, func=<function nanmean>, debug=False)[source]

Performs efficient block averaging in one or multiple dimensions. Only works on regular grid dimensions.

Parameters
  • da (xarray DataArray (must be a dask array!)) –

  • blocks (list) – List of tuples containing the dimension and interval to aggregate over

  • func (function) – Aggregation function.Defaults to numpy.nanmean

Returns

da_agg – Aggregated array

Return type

xarray Data

Examples

>>> from xarrayutils import aggregate
>>> import numpy as np
>>> import xarray as xr
>>> import matplotlib.pyplot as plt
>>> %matplotlib inline
>>> import dask.array as da
>>> x = np.arange(-10,10)
>>> y = np.arange(-10,10)
>>> xx,yy = np.meshgrid(x,y)
>>> z = xx**2-yy**2
>>> a = xr.DataArray(da.from_array(z, chunks=(20, 20)),
                     coords={'x':x,'y':y}, dims=['y','x'])
>>> print a

<xarray.DataArray ‘array-7e422c91624f207a5f7ebac426c01769’ (y: 20, x: 20)> dask.array<array-7…, shape=(20, 20), dtype=int64, chunksize=(20, 20)> Coordinates:

  • y (y) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

  • x (x) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

>>> blocks = [('x',2),('y',5)]
>>> a_coarse = aggregate(a,blocks,func=np.mean)
>>> print a_coarse

<xarray.DataArray ‘array-7e422c91624f207a5f7ebac426c01769’ (y: 2, x: 10)> dask.array<coarsen…, shape=(2, 10), dtype=float64, chunksize=(2, 10)> Coordinates:

  • y (y) int64 -10 0

  • x (x) int64 -10 -8 -6 -4 -2 0 2 4 6 8

Coarsened with

<function mean at 0x111754230>

xarrayutils.utils.Coarsenblocks

[(‘x’, 2), (‘y’, 10)]

xarrayutils.utils.aggregate_w_nanmean(da, weights, blocks, **kwargs)[source]

weighted nanmean for xarrays

xarrayutils.utils.composite(data, index, bounds)[source]

Composites Dataarray according to index

Parameters
  • data (xarray.Dataarray) –

  • index (xarray.Dataarray) – Timeseries matching one dimension of ‘data’. Values lower(higher) then ‘bounds’ are composited in additional coordinate

  • bounds (int or array_like) – Values determining the values of ‘index’ composited into [‘low’,’neutral’,’high’]. If given as int, bounds will be computed as [-std(index) std(index)]*bounds.

Returns

composited_array – xarray like data with additional composite-coordinate [‘low’,’neutral’,’high’] based on ‘bounds’

Return type

array_like

Examples

TODO

xarrayutils.utils.concat_dim_da(data, name)[source]

creates an xarray.Dataarray to label the concat dim in xarray.concat. data is the dimension array and name is the name (DuHHHHH)

xarrayutils.utils.convert_flux_array(da, da_full, dim, top=True, fillval=0)[source]
xarrayutils.utils.corrmap(a, b, shifts=0, a_x_dim='i', a_y_dim='j', a_x_coord=None, a_y_coord=None, b_x_dim='i', b_y_dim='j', b_x_coord=None, b_y_coord=None, t_dim='time', debug=True)[source]

a – input b – target ()

TODO This thing is slow. I can most likely rewrite this with numpy.apply_along_axis

xarrayutils.utils.dll_dist(dlon, dlat, lon, lat)[source]

Converts lat/lon differentials into distances

Parameters
  • dlon (xarray.DataArray longitude differentials) –

  • dlat (xarray.DataArray latitude differentials) –

  • lon (xarray.DataArray longitude values) –

  • lat (xarray.DataArray latitude values) –

Returns

  • dx (xarray.DataArray distance inferred from dlon)

  • dy (xarray.DataArray distance inferred from dlat)

xarrayutils.utils.extractBox(da, box, xdim='lon', ydim='lat')[source]
xarrayutils.utils.extractBox_dict(ds, box, concat_wrap=True)[source]

Advanced box extraction from xarray Dataset

xarrayutils.utils.extractBoxes(da, bo, xname=None, yname=None, xdim='lon', ydim='lat')[source]
xarrayutils.utils.fancymean(raw, dim=None, axis=None, method='arithmetic', weights=None, debug=False)[source]

extenden mean function for xarray

Applies various methods to estimate mean values {arithmetic,geometric,harmonic} along specified dimension with optional weigthing values, which can be a coordinate in the passed xarray structure

xarrayutils.utils.filter_1D(data, std, dim='time', dtype=None)[source]
xarrayutils.utils.lag_and_combine(ds, lags, dim='time')[source]

Creates lagged versions of the input object, combined along new lag dimension. NOTE: Lagging produces missing values at boundary. Use .fillna(…) to avoid problems with e.g. xr_linregress.

Parameters
  • ds ({xr.DataArray, xr.Dataset}) – Input object

  • lags (np.Array) – Lags to be computed and combined. Values denote number of timesteps. Negative lag indicates a shift backwards (to the left of the axis).

  • dim (str) – dimension of ds to be lagged

Returns

Lagged version of ds with additional dimension lag

Return type

{xr.DataArray, xr.Dataset}

xarrayutils.utils.linear_trend(obj, dim)[source]

Convenience wrapper for ‘xr_linregress’. Calculates the trend per given timestep. E.g. if the data is passed as yearly values, the trend is in units/yr.

xarrayutils.utils.mask_mixedlayer(ds, mld, mask='outside', z_dim='lev', z_bounds='lev_bounds', ref_var=None, bound_dim='bnds')[source]

Remove all values from input data ds that are above the depth defined by mld. If cell bounds are given in the input data, the selection is more accurate, otherwise masking will be perfomed based on cell center values.

Parameters
  • ds (xr.Dataset) – Input data

  • mld (xr.Dataarray) – Mixed Layer Depth input

  • mask (str, optional) – Switch that determines if values outside (outside) or (inside) are preserved by the masking

  • z_dim (str, optional) – Depth dimension of ds, by default “lev”

  • z_bounds (str, optional) – Cell bounds coordinates along z_dim, by default “lev_bounds”

  • ref_var (str, optional) – Reference variable to broadcast against, by default None

Returns

ds with mixed layer values replaced by missing values

Return type

xr.Dataset

xarrayutils.utils.remove_bottom_values(ds, dim='lev', fill_val=-10000000000.0)[source]

Remove the deepest values that are not nan along the dimension dim

xarrayutils.utils.shift_lon(ds, londim, shift=360, crit=0, smaller=True, sort=True)[source]
xarrayutils.utils.sign_agreement(da, ds_ref, dim, threshold=0.75, mask=True, count_nans=True)[source]

[summary]

Parameters
  • da (xr.DataArray) – Input data

  • ds_ref (xr.DataArray) – Reference data to compare the sign to . E.g. a mean over dim

  • dim (str) – Dimension of da over which the sign agreement is evaluated

  • threshold (float, optional) – The minimum fraction of elements that have to agree along dim, by default 0.75 (75%)

  • mask (bool, optional) – If True, datapoints with all nan values along dim get masked out in the output, by default True

  • count_nans (bool, optional) – If True, nans along dim are counted towards the threshold. If False sign agreement is calculated according to non-nan values only, by default True

xarrayutils.utils.timefilter(xr_in, steps, step_spec, timename='time', filtertype='gaussian', stdev=0.1)[source]
xarrayutils.utils.xr_detrend(b, dim='time', trend_params=None, convert_datetime=True)[source]

Removes linear trend along dimension dim from dataarray b. If no trend_params are passed (default), the linear trend is calculated using xr_linregress. :param b: Data source to be detrended. :type b: {xr.DataArray, xr.Dataset} :param dim: Dimension along which to remove linear trend :type dim: str :param trend_params: Precomputed output of xr_linregress.

This can be usefull for large datasets where intermediate results are saved already. Defaults to None, meaning the linear trend is computed within the function.

Parameters

convert_datetime (bool) – If true (default), the dimension dim is converted from a datetime to float.

xarrayutils.utils.xr_linregress(x, y, dim='time')[source]

Calculates linear regression along dimension dim. Results are equivalent to scipy.stats.linregress.

Parameters
  • x ({xr.DataArray}) – Independent variable for linear regression. E.g. time.

  • y ({xr.DataArray, xr.Dataset}) – Dependent variable.

  • dim (str) – Dimension over which to perform linear regression. Must be present in both a and b (the default is ‘time’).

Returns

Returns a dataarray containing the parameter values for each data_variable in b. The naming convention follows scipy.stats.linregress

Return type

type(b)

xarrayutils.plotting.axis_arrow(ax, x_loc, text, arrowprops={}, **kwargs)[source]

Puts an arrow pointing at x_loc onto (but outside of ) the xaxis of a plot.For now only works on xaxis and on the top. Modify when necessary

Parameters
  • ax (matplotlib.axis) – axis to plot on.

  • x_loc (type) – Position of the arrow (in units of ax x-axis).

  • text (str) – Text next to arrow.

  • arrowprops (dict) – Additional arguments to pass to arrowprops. See mpl.axes.annotate for details.

  • kwargs – additional keyword arguments passed to ax.annotate

xarrayutils.plotting.box_plot(box, ax=None, split_detection='True', **kwargs)[source]

plots box despite coordinate discontinuities. INPUT —– box: np.array

Defines the box in the coordinates of the current axis. Describing the box corners [x1, x2, y1, y2]

ax: matplotlib.axis

axis for plotting. Defaults to plt.gca()

kwargs: optional

anything that can be passed to plot can be put as kwarg

xarrayutils.plotting.box_plot_dict(di, xdim='lon', ydim='lat', **kwargs)[source]

plot box from xarray selection dict e.g. {‘xdim’:slice(a, b), ‘ydim’:slice(c,d), …}

xarrayutils.plotting.center_lim(ax, which='y')[source]
xarrayutils.plotting.depth_logscale(ax, yscale=400, ticks=None)[source]
xarrayutils.plotting.dict2box(di, xdim='lon', ydim='lat')[source]
xarrayutils.plotting.draw_dens_contours_teos10(sigma='sigma0', add_labels=True, ax=None, density_grid=20, dens_interval=1.0, salt_on_x=True, slim=None, tlim=None, contour_kwargs={}, c_label_kwargs={}, **kwargs)[source]

draws density contours on the current plot. Assumes that the salinity and temperature values are given as SA and CT. Needs documentation…

xarrayutils.plotting.letter_subplots(axes, start_idx=0, box_color=None, labels=None, **kwargs)[source]

Adds panel letters in boxes to each element of axes in the upper left corner.

Parameters
  • axes (list, array_like) – List or array of matplotlib axes objects.

  • start_idx (type) – Starting index in the alphabet (e.g. 0 is ‘a’).

  • box_color (type) – Color of the box behind each letter (the default is None).

  • labels (list) – List of strings used as labels (if None (default), uses lowercase alphabet followed by uppercase alphabet)

  • **kwargs (type) – kwargs passed to matplotlib.axis.text

xarrayutils.plotting.linear_piecewise_scale(cut, scale, ax=None, axis='y', scaled_half='upper', add_cut_line=False)[source]

This function sets a piecewise linear scaling for a given axis to highlight e.g. processes in the upper ocean vs deep ocean.

Parameters
  • cut (float) – value along the chosen axis used as transition between the two linear scalings.

  • scale (float) – scaling coefficient for the chosen axis portion (determined by axis and scaled_half). A higher number means the chosen portion of the axis will be more compressed. Must be positive. 0 means no compression.

  • ax (matplotlib.axis, optional) – The plot axis object. Defaults to current matplotlib axis

  • axis (str, optional) – Which axis of the plot to act on. * ‘y’ (Default) * ‘x’

  • scaled_half (str, optional) – Determines which half of the axis is scaled (compressed). * ‘upper’ (default). Values larger than cut are compressed * ‘lower’. Values smaller than cut are compressed

Returns

ax_scaled

Return type

matplotlib.axis

xarrayutils.plotting.map_util_plot(ax, land_color='0.7', coast_color='0.3', lake_alpha=0.5, labels=False)[source]

Helper tool to add good default map to cartopy axes.

Parameters
  • ax (cartopy.geoaxes (not sure this is right)) – The axis to plot on (must be a cartopy axis).

  • land_color (type) – Color of land fill (the default is ‘0.7’).

  • coast_color (type) – Color of costline (the default is ‘0.3’).

  • lake_alpha (type) – Transparency of lakes (the default is 0.5).

  • labels (type) – Not implemented.

xarrayutils.plotting.plot_line_shaded_std(x, y, std_y, horizontal=True, ax=None, line_kwargs={}, fill_kwargs={})[source]

Plot wrapper to draw line for y and shaded patch according to std_y. The shading represents one std on each side of the line…

Parameters
  • x (numpy.array or xr.DataArray) – Coordinate.

  • y (numpy.array or xr.DataArray) – line data.

  • std_y (numpy.array or xr.DataArray) – std corresponding to y.

  • horizontal (bool) – Determines if the plot is horizontal or vertical (e.g. x is plotted on the y-axis).

  • ax (matplotlib.axes) – Matplotlib axes object to plot on (the default is plt.gca()).

  • line_kwargs (dict) – optional parameters for line plot.

  • fill_kwargs (dict) – optional parameters for std fill plot.

Returns

Tuple of line and patch objects.

Return type

(ll, ff)

xarrayutils.plotting.same_y_range(axes)[source]

Adjusts multiple axes so that the range of y values is the same everywhere, but not the actual values.

Parameters

axes (np.array) – An array of matplotlib.axes objects produced by e.g. plt.subplots()

xarrayutils.plotting.shaded_line_plot(da, dim, ax=None, horizontal=True, spreads=None, alphas=[0.25, 0.4], spread_style='std', line_kwargs={}, fill_kwargs={}, **kwargs)[source]

Produces a line plot with shaded intervals based on the spread of da in dim.

Parameters
  • da (xr.DataArray) – The input data. Needs to be 2 dimensional, so that when dim is reduced, it is a line plot.

  • dim (str) – Dimension of da which is used to calculate spread

  • ax (matplotlib.axes) – Matplotlib axes object to plot on (the default is plt.gca()).

  • horizontal (bool) – Determines if the plot is horizontal or vertical (e.g. x is plotted on the y-axis).

  • spread (np.array, optional) – Values specifying the ‘spread-values’, dependent on spread_style. Defaults to shading the range of 1 and 2 standard deviations in dim

  • alpha (np.array, optional) – Transparency values of the shaded ranges. Defaults to [0.5,0.15].

  • spread_style (str) –

    Metric used to define spread on dim. Options:

    ’std’: Calculates standard deviation along dim and shading indicates multiples of std centered on the mean

    ’quantile’: Calculates quantile ranges. An input of spread=[0.2,0.5] would show an inner shading for the 40th-60th percentile, and an outer shading for the 25th-75th percentile, centered on the 50th quantile (~median). Must be within [0,100].

  • line_kwargs (dict) – optional parameters for line plot.

  • fill_kwargs (dict) – optional parameters for std fill plot.

  • **kwargs – Keyword arguments passed to both line plot and fill_between.

Example

xarrayutils.plotting.tsdiagram(salt, temp, color=None, size=None, lon=None, lat=None, pressure=None, convert_teos10=True, ts_kwargs={}, ax=None, fig=None, draw_density_contours=True, draw_cbar=True, add_labels=True, **kwargs)[source]
xarrayutils.plotting.xr_violinplot(ds, ax=None, x_dim='xt_ocean', width=1, color='0.5')[source]

Wrapper of matplotlib violinplot for xarray.DataArray.

Parameters
  • ds (xr.DataArray) – Input data.

  • ax (matplotlib.axis) – Plotting axis (the default is None).

  • x_dim (str) – dimension that defines the x-axis of the plot (the default is ‘xt_ocean’).

  • width (float) – Scaling width of each violin (the default is 1).

  • color (type) – Color of the violin (the default is ‘0.5’).

Returns

Description of returned object.

Return type

type

xarrayutils.file_handling.checkpath(func)[source]
xarrayutils.file_handling.file_exist_check(filepath, check_zarr_consolidated_complete=True)[source]

Check if a file exists, with some extra checks for zarr files

Parameters
  • filepath (path) – path to the file to check

  • check_zarr_consolidated_complete (bool, optional) – Check if .zmetadata file was written (consolidated metadata), by default True

xarrayutils.file_handling.maybe_create_folder(path)[source]
xarrayutils.file_handling.temp_write_split(ds_in, folder, method='dimension', dim='time', split_interval=40, zarr_write_kwargs={}, zarr_read_kwargs={}, file_name_pattern='temp_write_split', verbose=False)[source]

[summary]

Parameters
  • ds_in (xr.Dataset) – input

  • folder (pathlib.Path) – Target folder for temporary files

  • method (str, optional) – Defines if the temporary files are split by an increment along a certain dimension(“dimension”) or by the variables of the dataset (“variables”), by default “dimension”

  • dim (str, optional) – Dimension to split along (only relevant for method=”dimension”), by default “time”

  • split_interval (int, optional) – Steps along dim for each temporary file (only relevant for method=”dimension”), by default 40

  • zarr_write_kwargs (dict, optional) – Kwargs parsed to xr.to_zarr(), by default {}

  • zarr_read_kwargs (dict, optional) – Kwargs parsed to xr.open_zarr(), by default {}

  • file_name_pattern (str, optional) – Pattern used to name the temporary files, by default “temp_write_split”

  • verbose (bool, optional) – Activates printing, by default False

Returns

  • ds_out (xr.Dataset) – reloaded dataset, with value identical to ds_in

  • flist (list) – List of paths to temporary datasets written.

xarrayutils.file_handling.total_nested_size(nested)[source]

Calculate the size of a nested dict full of xarray objects

Parameters

nested (dict) – Input dictionary. Can have arbitrary nesting levels

Returns

total size in bytes

Return type

float

xarrayutils.file_handling.write(ds, path, print_size=True, consolidated=True, **kwargs)[source]

Convenience function to save large datasets. Performs the following additional steps (compared to e.g. xr.to_netcdf() or xr.to_zarr())

  1. Checks for existing files (with special checks for zarr files)

  2. Handles existing files via overwrite argument.

  3. Checks attributes for incompatible values

4. Optional: Prints size of saved dataset 4. Optional: Returns the saved dataset loaded from disk (e.g. for quality control)

Parameters
  • ds (xr.Dataset) – Input dataset

  • path (pathlib.Path) – filepath to save to. Ending determines the output type (.nc for netcdf, .zarr for zarr)

  • print_size (bool, optional) – If true prints the size of the dataset before saving, by default True

  • reload_saved (bool, optional) – If true the returned datasets is opened from the written file, otherwise the input is returned, by default True

  • open_kwargs (dict) – Arguments passed to the reloading function (either xr.open_dataset or xr.open_zarr based on filename)

  • write_kwargs (dict) – Arguments passed to the writing function (either xr.to_netcdf or xr.to_zarr based on filename)

  • overwrite (bool, optional) – If True, overwrite existing files, by default False

  • check_zarr_consolidated_complete (bool, optional) – If True check if .zmetadata is present in zarr store, and overwrite if not present, by default False

Returns

Returns either the unmodified input dataset or a reloaded version from the written file

Return type

xr.Dataset