API
¶
- xarrayutils.utils.aggregate(da, blocks, func=<function nanmean>, debug=False)[source]¶
Performs efficient block averaging in one or multiple dimensions. Only works on regular grid dimensions.
- Parameters
da (xarray DataArray (must be a dask array!)) –
blocks (list) – List of tuples containing the dimension and interval to aggregate over
func (function) – Aggregation function.Defaults to numpy.nanmean
- Returns
da_agg – Aggregated array
- Return type
xarray Data
Examples
>>> from xarrayutils import aggregate >>> import numpy as np >>> import xarray as xr >>> import matplotlib.pyplot as plt >>> %matplotlib inline >>> import dask.array as da>>> x = np.arange(-10,10) >>> y = np.arange(-10,10) >>> xx,yy = np.meshgrid(x,y) >>> z = xx**2-yy**2 >>> a = xr.DataArray(da.from_array(z, chunks=(20, 20)), coords={'x':x,'y':y}, dims=['y','x']) >>> print a<xarray.DataArray ‘array-7e422c91624f207a5f7ebac426c01769’ (y: 20, x: 20)> dask.array<array-7…, shape=(20, 20), dtype=int64, chunksize=(20, 20)> Coordinates:
y (y) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
x (x) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
>>> blocks = [('x',2),('y',5)] >>> a_coarse = aggregate(a,blocks,func=np.mean) >>> print a_coarse<xarray.DataArray ‘array-7e422c91624f207a5f7ebac426c01769’ (y: 2, x: 10)> dask.array<coarsen…, shape=(2, 10), dtype=float64, chunksize=(2, 10)> Coordinates:
y (y) int64 -10 0
x (x) int64 -10 -8 -6 -4 -2 0 2 4 6 8
- Coarsened with
<function mean at 0x111754230>
- xarrayutils.utils.Coarsenblocks¶
[(‘x’, 2), (‘y’, 10)]
- xarrayutils.utils.aggregate_w_nanmean(da, weights, blocks, **kwargs)[source]¶
weighted nanmean for xarrays
- xarrayutils.utils.composite(data, index, bounds)[source]¶
Composites Dataarray according to index
- Parameters
data (xarray.Dataarray) –
index (xarray.Dataarray) – Timeseries matching one dimension of ‘data’. Values lower(higher) then ‘bounds’ are composited in additional coordinate
bounds (int or array_like) – Values determining the values of ‘index’ composited into [‘low’,’neutral’,’high’]. If given as int, bounds will be computed as [-std(index) std(index)]*bounds.
- Returns
composited_array – xarray like data with additional composite-coordinate [‘low’,’neutral’,’high’] based on ‘bounds’
- Return type
array_like
Examples
TODO
- xarrayutils.utils.concat_dim_da(data, name)[source]¶
creates an xarray.Dataarray to label the concat dim in xarray.concat. data is the dimension array and name is the name (DuHHHHH)
- xarrayutils.utils.corrmap(a, b, shifts=0, a_x_dim='i', a_y_dim='j', a_x_coord=None, a_y_coord=None, b_x_dim='i', b_y_dim='j', b_x_coord=None, b_y_coord=None, t_dim='time', debug=True)[source]¶
a – input b – target ()
TODO This thing is slow. I can most likely rewrite this with numpy.apply_along_axis
- xarrayutils.utils.dll_dist(dlon, dlat, lon, lat)[source]¶
Converts lat/lon differentials into distances
- Parameters
dlon (xarray.DataArray longitude differentials) –
dlat (xarray.DataArray latitude differentials) –
lon (xarray.DataArray longitude values) –
lat (xarray.DataArray latitude values) –
- Returns
dx (xarray.DataArray distance inferred from dlon)
dy (xarray.DataArray distance inferred from dlat)
- xarrayutils.utils.extractBox_dict(ds, box, concat_wrap=True)[source]¶
Advanced box extraction from xarray Dataset
- xarrayutils.utils.fancymean(raw, dim=None, axis=None, method='arithmetic', weights=None, debug=False)[source]¶
extenden mean function for xarray
Applies various methods to estimate mean values {arithmetic,geometric,harmonic} along specified dimension with optional weigthing values, which can be a coordinate in the passed xarray structure
- xarrayutils.utils.lag_and_combine(ds, lags, dim='time')[source]¶
Creates lagged versions of the input object, combined along new lag dimension. NOTE: Lagging produces missing values at boundary. Use .fillna(…) to avoid problems with e.g. xr_linregress.
- Parameters
ds ({xr.DataArray, xr.Dataset}) – Input object
lags (np.Array) – Lags to be computed and combined. Values denote number of timesteps. Negative lag indicates a shift backwards (to the left of the axis).
dim (str) – dimension of ds to be lagged
- Returns
Lagged version of ds with additional dimension lag
- Return type
{xr.DataArray, xr.Dataset}
- xarrayutils.utils.linear_trend(obj, dim)[source]¶
Convenience wrapper for ‘xr_linregress’. Calculates the trend per given timestep. E.g. if the data is passed as yearly values, the trend is in units/yr.
- xarrayutils.utils.mask_mixedlayer(ds, mld, mask='outside', z_dim='lev', z_bounds='lev_bounds', ref_var=None, bound_dim='bnds')[source]¶
Remove all values from input data ds that are above the depth defined by mld. If cell bounds are given in the input data, the selection is more accurate, otherwise masking will be perfomed based on cell center values.
- Parameters
ds (xr.Dataset) – Input data
mld (xr.Dataarray) – Mixed Layer Depth input
mask (str, optional) – Switch that determines if values outside (outside) or (inside) are preserved by the masking
z_dim (str, optional) – Depth dimension of ds, by default “lev”
z_bounds (str, optional) – Cell bounds coordinates along z_dim, by default “lev_bounds”
ref_var (str, optional) – Reference variable to broadcast against, by default None
- Returns
ds with mixed layer values replaced by missing values
- Return type
xr.Dataset
- xarrayutils.utils.remove_bottom_values(ds, dim='lev', fill_val=-10000000000.0)[source]¶
Remove the deepest values that are not nan along the dimension dim
- xarrayutils.utils.sign_agreement(da, ds_ref, dim, threshold=0.75, mask=True, count_nans=True)[source]¶
[summary]
- Parameters
da (xr.DataArray) – Input data
ds_ref (xr.DataArray) – Reference data to compare the sign to . E.g. a mean over dim
dim (str) – Dimension of da over which the sign agreement is evaluated
threshold (float, optional) – The minimum fraction of elements that have to agree along dim, by default 0.75 (75%)
mask (bool, optional) – If True, datapoints with all nan values along dim get masked out in the output, by default True
count_nans (bool, optional) – If True, nans along dim are counted towards the threshold. If False sign agreement is calculated according to non-nan values only, by default True
- xarrayutils.utils.timefilter(xr_in, steps, step_spec, timename='time', filtertype='gaussian', stdev=0.1)[source]¶
- xarrayutils.utils.xr_detrend(b, dim='time', trend_params=None, convert_datetime=True)[source]¶
Removes linear trend along dimension dim from dataarray b. If no trend_params are passed (default), the linear trend is calculated using xr_linregress. :param b: Data source to be detrended. :type b: {xr.DataArray, xr.Dataset} :param dim: Dimension along which to remove linear trend :type dim: str :param trend_params: Precomputed output of xr_linregress.
This can be usefull for large datasets where intermediate results are saved already. Defaults to None, meaning the linear trend is computed within the function.
- Parameters
convert_datetime (bool) – If true (default), the dimension dim is converted from a datetime to float.
- xarrayutils.utils.xr_linregress(x, y, dim='time')[source]¶
Calculates linear regression along dimension dim. Results are equivalent to scipy.stats.linregress.
- Parameters
x ({xr.DataArray}) – Independent variable for linear regression. E.g. time.
y ({xr.DataArray, xr.Dataset}) – Dependent variable.
dim (str) – Dimension over which to perform linear regression. Must be present in both a and b (the default is ‘time’).
- Returns
Returns a dataarray containing the parameter values for each data_variable in b. The naming convention follows scipy.stats.linregress
- Return type
type(b)
- xarrayutils.plotting.axis_arrow(ax, x_loc, text, arrowprops={}, **kwargs)[source]¶
Puts an arrow pointing at x_loc onto (but outside of ) the xaxis of a plot.For now only works on xaxis and on the top. Modify when necessary
- Parameters
ax (matplotlib.axis) – axis to plot on.
x_loc (type) – Position of the arrow (in units of ax x-axis).
text (str) – Text next to arrow.
arrowprops (dict) – Additional arguments to pass to arrowprops. See mpl.axes.annotate for details.
kwargs – additional keyword arguments passed to ax.annotate
- xarrayutils.plotting.box_plot(box, ax=None, split_detection='True', **kwargs)[source]¶
plots box despite coordinate discontinuities. INPUT —– box: np.array
Defines the box in the coordinates of the current axis. Describing the box corners [x1, x2, y1, y2]
- ax: matplotlib.axis
axis for plotting. Defaults to plt.gca()
- kwargs: optional
anything that can be passed to plot can be put as kwarg
- xarrayutils.plotting.box_plot_dict(di, xdim='lon', ydim='lat', **kwargs)[source]¶
plot box from xarray selection dict e.g. {‘xdim’:slice(a, b), ‘ydim’:slice(c,d), …}
- xarrayutils.plotting.draw_dens_contours_teos10(sigma='sigma0', add_labels=True, ax=None, density_grid=20, dens_interval=1.0, salt_on_x=True, slim=None, tlim=None, contour_kwargs={}, c_label_kwargs={}, **kwargs)[source]¶
draws density contours on the current plot. Assumes that the salinity and temperature values are given as SA and CT. Needs documentation…
- xarrayutils.plotting.letter_subplots(axes, start_idx=0, box_color=None, labels=None, **kwargs)[source]¶
Adds panel letters in boxes to each element of axes in the upper left corner.
- Parameters
axes (list, array_like) – List or array of matplotlib axes objects.
start_idx (type) – Starting index in the alphabet (e.g. 0 is ‘a’).
box_color (type) – Color of the box behind each letter (the default is None).
labels (list) – List of strings used as labels (if None (default), uses lowercase alphabet followed by uppercase alphabet)
**kwargs (type) – kwargs passed to matplotlib.axis.text
- xarrayutils.plotting.linear_piecewise_scale(cut, scale, ax=None, axis='y', scaled_half='upper', add_cut_line=False)[source]¶
This function sets a piecewise linear scaling for a given axis to highlight e.g. processes in the upper ocean vs deep ocean.
- Parameters
cut (float) – value along the chosen axis used as transition between the two linear scalings.
scale (float) – scaling coefficient for the chosen axis portion (determined by axis and scaled_half). A higher number means the chosen portion of the axis will be more compressed. Must be positive. 0 means no compression.
ax (matplotlib.axis, optional) – The plot axis object. Defaults to current matplotlib axis
axis (str, optional) – Which axis of the plot to act on. * ‘y’ (Default) * ‘x’
scaled_half (str, optional) – Determines which half of the axis is scaled (compressed). * ‘upper’ (default). Values larger than cut are compressed * ‘lower’. Values smaller than cut are compressed
- Returns
ax_scaled
- Return type
matplotlib.axis
- xarrayutils.plotting.map_util_plot(ax, land_color='0.7', coast_color='0.3', lake_alpha=0.5, labels=False)[source]¶
Helper tool to add good default map to cartopy axes.
- Parameters
ax (cartopy.geoaxes (not sure this is right)) – The axis to plot on (must be a cartopy axis).
land_color (type) – Color of land fill (the default is ‘0.7’).
coast_color (type) – Color of costline (the default is ‘0.3’).
lake_alpha (type) – Transparency of lakes (the default is 0.5).
labels (type) – Not implemented.
- xarrayutils.plotting.plot_line_shaded_std(x, y, std_y, horizontal=True, ax=None, line_kwargs={}, fill_kwargs={})[source]¶
Plot wrapper to draw line for y and shaded patch according to std_y. The shading represents one std on each side of the line…
- Parameters
x (numpy.array or xr.DataArray) – Coordinate.
y (numpy.array or xr.DataArray) – line data.
std_y (numpy.array or xr.DataArray) – std corresponding to y.
horizontal (bool) – Determines if the plot is horizontal or vertical (e.g. x is plotted on the y-axis).
ax (matplotlib.axes) – Matplotlib axes object to plot on (the default is plt.gca()).
line_kwargs (dict) – optional parameters for line plot.
fill_kwargs (dict) – optional parameters for std fill plot.
- Returns
Tuple of line and patch objects.
- Return type
(ll, ff)
- xarrayutils.plotting.same_y_range(axes)[source]¶
Adjusts multiple axes so that the range of y values is the same everywhere, but not the actual values.
- Parameters
axes (np.array) – An array of matplotlib.axes objects produced by e.g. plt.subplots()
- xarrayutils.plotting.shaded_line_plot(da, dim, ax=None, horizontal=True, spreads=None, alphas=[0.25, 0.4], spread_style='std', line_kwargs={}, fill_kwargs={}, **kwargs)[source]¶
Produces a line plot with shaded intervals based on the spread of da in dim.
- Parameters
da (xr.DataArray) – The input data. Needs to be 2 dimensional, so that when dim is reduced, it is a line plot.
dim (str) – Dimension of da which is used to calculate spread
ax (matplotlib.axes) – Matplotlib axes object to plot on (the default is plt.gca()).
horizontal (bool) – Determines if the plot is horizontal or vertical (e.g. x is plotted on the y-axis).
spread (np.array, optional) – Values specifying the ‘spread-values’, dependent on spread_style. Defaults to shading the range of 1 and 2 standard deviations in dim
alpha (np.array, optional) – Transparency values of the shaded ranges. Defaults to [0.5,0.15].
spread_style (str) –
Metric used to define spread on dim. Options:
’std’: Calculates standard deviation along dim and shading indicates multiples of std centered on the mean
’quantile’: Calculates quantile ranges. An input of spread=[0.2,0.5] would show an inner shading for the 40th-60th percentile, and an outer shading for the 25th-75th percentile, centered on the 50th quantile (~median). Must be within [0,100].
line_kwargs (dict) – optional parameters for line plot.
fill_kwargs (dict) – optional parameters for std fill plot.
**kwargs – Keyword arguments passed to both line plot and fill_between.
Example
- xarrayutils.plotting.tsdiagram(salt, temp, color=None, size=None, lon=None, lat=None, pressure=None, convert_teos10=True, ts_kwargs={}, ax=None, fig=None, draw_density_contours=True, draw_cbar=True, add_labels=True, **kwargs)[source]¶
- xarrayutils.plotting.xr_violinplot(ds, ax=None, x_dim='xt_ocean', width=1, color='0.5')[source]¶
Wrapper of matplotlib violinplot for xarray.DataArray.
- Parameters
ds (xr.DataArray) – Input data.
ax (matplotlib.axis) – Plotting axis (the default is None).
x_dim (str) – dimension that defines the x-axis of the plot (the default is ‘xt_ocean’).
width (float) – Scaling width of each violin (the default is 1).
color (type) – Color of the violin (the default is ‘0.5’).
- Returns
Description of returned object.
- Return type
type
- xarrayutils.file_handling.file_exist_check(filepath, check_zarr_consolidated_complete=True)[source]¶
Check if a file exists, with some extra checks for zarr files
- Parameters
filepath (path) – path to the file to check
check_zarr_consolidated_complete (bool, optional) – Check if .zmetadata file was written (consolidated metadata), by default True
- xarrayutils.file_handling.temp_write_split(ds_in, folder, method='dimension', dim='time', split_interval=40, zarr_write_kwargs={}, zarr_read_kwargs={}, file_name_pattern='temp_write_split', verbose=False)[source]¶
[summary]
- Parameters
ds_in (xr.Dataset) – input
folder (pathlib.Path) – Target folder for temporary files
method (str, optional) – Defines if the temporary files are split by an increment along a certain dimension(“dimension”) or by the variables of the dataset (“variables”), by default “dimension”
dim (str, optional) – Dimension to split along (only relevant for method=”dimension”), by default “time”
split_interval (int, optional) – Steps along dim for each temporary file (only relevant for method=”dimension”), by default 40
zarr_write_kwargs (dict, optional) – Kwargs parsed to xr.to_zarr(), by default {}
zarr_read_kwargs (dict, optional) – Kwargs parsed to xr.open_zarr(), by default {}
file_name_pattern (str, optional) – Pattern used to name the temporary files, by default “temp_write_split”
verbose (bool, optional) – Activates printing, by default False
- Returns
ds_out (xr.Dataset) – reloaded dataset, with value identical to ds_in
flist (list) – List of paths to temporary datasets written.
- xarrayutils.file_handling.total_nested_size(nested)[source]¶
Calculate the size of a nested dict full of xarray objects
- Parameters
nested (dict) – Input dictionary. Can have arbitrary nesting levels
- Returns
total size in bytes
- Return type
float
- xarrayutils.file_handling.write(ds, path, print_size=True, consolidated=True, **kwargs)[source]¶
Convenience function to save large datasets. Performs the following additional steps (compared to e.g. xr.to_netcdf() or xr.to_zarr())
Checks for existing files (with special checks for zarr files)
Handles existing files via overwrite argument.
Checks attributes for incompatible values
4. Optional: Prints size of saved dataset 4. Optional: Returns the saved dataset loaded from disk (e.g. for quality control)
- Parameters
ds (xr.Dataset) – Input dataset
path (pathlib.Path) – filepath to save to. Ending determines the output type (.nc for netcdf, .zarr for zarr)
print_size (bool, optional) – If true prints the size of the dataset before saving, by default True
reload_saved (bool, optional) – If true the returned datasets is opened from the written file, otherwise the input is returned, by default True
open_kwargs (dict) – Arguments passed to the reloading function (either xr.open_dataset or xr.open_zarr based on filename)
write_kwargs (dict) – Arguments passed to the writing function (either xr.to_netcdf or xr.to_zarr based on filename)
overwrite (bool, optional) – If True, overwrite existing files, by default False
check_zarr_consolidated_complete (bool, optional) – If True check if .zmetadata is present in zarr store, and overwrite if not present, by default False
- Returns
Returns either the unmodified input dataset or a reloaded version from the written file
- Return type
xr.Dataset