Indexing and selecting data (2024)

Xarray offers extremely flexible indexing routines that combine the bestfeatures of NumPy and pandas for data selection.

The most basic way to access elements of a DataArrayobject is to use Python’s [] syntax, such as array[i, j], wherei and j are both integers.As xarray objects can store coordinates corresponding to each dimension of anarray, label-based indexing similar to pandas.DataFrame.loc is also possible.In label-based indexing, the element position i is automaticallylooked-up from the coordinate values.

Dimensions of xarray objects have names, so you can also lookup the dimensionsby name, instead of remembering their positional order.

Quick overview#

In total, xarray supports four different kinds of indexing, as describedbelow and summarized in this table:

Dimension lookup

Index lookup

DataArray syntax

Dataset syntax

Positional

By integer

da[:, 0]

not available

Positional

By label

da.loc[:, 'IA']

not available

By name

By integer

da.isel(space=0) or
da[dict(space=0)]

ds.isel(space=0) or
ds[dict(space=0)]

By name

By label

da.sel(space='IA') or
da.loc[dict(space='IA')]

ds.sel(space='IA') or
ds.loc[dict(space='IA')]

More advanced indexing is also possible for all the methods bysupplying DataArray objects as indexer.See Vectorized Indexing for the details.

Positional indexing#

Indexing a DataArray directly works (mostly) just like itdoes for numpy arrays, except that the returned object is always anotherDataArray:

In [1]: da = xr.DataArray( ...:  np.random.rand(4, 3), ...:  [ ...:  ("time", pd.date_range("2000-01-01", periods=4)), ...:  ("space", ["IA", "IL", "IN"]), ...:  ], ...: ) ...: In [2]: da[:2]Out[2]: <xarray.DataArray (time: 2, space: 3)> Size: 48Barray([[0.127, 0.967, 0.26 ], [0.897, 0.377, 0.336]])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 24B 'IA' 'IL' 'IN'In [3]: da[0, 0]Out[3]: <xarray.DataArray ()> Size: 8Barray(0.127)Coordinates: time datetime64[ns] 8B 2000-01-01 space <U2 8B 'IA'In [4]: da[:, [2, 1]]Out[4]: <xarray.DataArray (time: 4, space: 2)> Size: 64Barray([[0.26 , 0.967], [0.336, 0.377], [0.123, 0.84 ], [0.448, 0.373]])Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 16B 'IN' 'IL'

Attributes are persisted in all indexing operations.

Warning

Positional indexing deviates from the NumPy when indexing with multiplearrays like da[[0, 1], [0, 1]], as described inVectorized Indexing.

Xarray also supports label-based indexing, just like pandas. Becausewe use a pandas.Index under the hood, label based indexing is veryfast. To do label based indexing, use the loc attribute:

In [5]: da.loc["2000-01-01":"2000-01-02", "IA"]Out[5]: <xarray.DataArray (time: 2)> Size: 16Barray([0.127, 0.897])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 space <U2 8B 'IA'

In this example, the selected is a subpart of the arrayin the range ‘2000-01-01’:’2000-01-02’ along the first coordinate timeand with ‘IA’ value from the second coordinate space.

You can perform any of the label indexing operations supported by pandas,including indexing with individual, slices and lists/arrays of labels, as well asindexing with boolean arrays. Like pandas, label based indexing in xarray isinclusive of both the start and stop bounds.

Setting values with label based indexing is also supported:

In [6]: da.loc["2000-01-01", ["IL", "IN"]] = -10In [7]: daOut[7]: <xarray.DataArray (time: 4, space: 3)> Size: 96Barray([[ 0.127, -10. , -10. ], [ 0.897, 0.377, 0.336], [ 0.451, 0.84 , 0.123], [ 0.543, 0.373, 0.448]])Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 24B 'IA' 'IL' 'IN'

Indexing with dimension names#

With the dimension names, we do not have to rely on dimension order and canuse them explicitly to slice data. There are two ways to do this:

  1. Use the sel() and isel()convenience methods:

    # index by integer array indicesIn [8]: da.isel(space=0, time=slice(None, 2))Out[8]: <xarray.DataArray (time: 2)> Size: 16Barray([0.127, 0.897])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 space <U2 8B 'IA'# index by dimension coordinate labelsIn [9]: da.sel(time=slice("2000-01-01", "2000-01-02"))Out[9]: <xarray.DataArray (time: 2, space: 3)> Size: 48Barray([[ 0.127, -10. , -10. ], [ 0.897, 0.377, 0.336]])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 24B 'IA' 'IL' 'IN'
  2. Use a dictionary as the argument for array positional or label based arrayindexing:

    # index by integer array indicesIn [10]: da[dict(space=0, time=slice(None, 2))]Out[10]: <xarray.DataArray (time: 2)> Size: 16Barray([0.127, 0.897])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 space <U2 8B 'IA'# index by dimension coordinate labelsIn [11]: da.loc[dict(time=slice("2000-01-01", "2000-01-02"))]Out[11]: <xarray.DataArray (time: 2, space: 3)> Size: 48Barray([[ 0.127, -10. , -10. ], [ 0.897, 0.377, 0.336]])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 24B 'IA' 'IL' 'IN'

The arguments to these methods can be any objects that could index the arrayalong the dimension given by the keyword, e.g., labels for an individual value,Python slice objects or 1-dimensional arrays.

Note

We would love to be able to do indexing with labeled dimension names insidebrackets, but unfortunately, Python does not yet support indexing withkeyword arguments like da[space=0]

Nearest neighbor lookups#

The label based selection methods sel(),reindex() and reindex_like() allsupport method and tolerance keyword argument. The method parameter allows forenabling nearest neighbor (inexact) lookups by use of the methods 'pad','backfill' or 'nearest':

In [12]: da = xr.DataArray([1, 2, 3], [("x", [0, 1, 2])])In [13]: da.sel(x=[1.1, 1.9], method="nearest")Out[13]: <xarray.DataArray (x: 2)> Size: 16Barray([2, 3])Coordinates: * x (x) int64 16B 1 2In [14]: da.sel(x=0.1, method="backfill")Out[14]: <xarray.DataArray ()> Size: 8Barray(2)Coordinates: x int64 8B 1In [15]: da.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")Out[15]: <xarray.DataArray (x: 5)> Size: 40Barray([1, 2, 2, 3, 3])Coordinates: * x (x) float64 40B 0.5 1.0 1.5 2.0 2.5

Tolerance limits the maximum distance for valid matches with an inexact lookup:

In [16]: da.reindex(x=[1.1, 1.5], method="nearest", tolerance=0.2)Out[16]: <xarray.DataArray (x: 2)> Size: 16Barray([ 2., nan])Coordinates: * x (x) float64 16B 1.1 1.5

The method parameter is not yet supported if any of the argumentsto .sel() is a slice object:

In [17]: da.sel(x=slice(1, 3), method="nearest")NotImplementedError

However, you don’t need to use method to do inexact slicing. Slicingalready returns all values inside the range (inclusive), as long as the indexlabels are monotonic increasing:

In [18]: da.sel(x=slice(0.9, 3.1))Out[18]: <xarray.DataArray (x: 2)> Size: 16Barray([2, 3])Coordinates: * x (x) int64 16B 1 2

Indexing axes with monotonic decreasing labels also works, as long as theslice or .loc arguments are also decreasing:

In [19]: reversed_da = da[::-1]In [20]: reversed_da.loc[3.1:0.9]Out[20]: <xarray.DataArray (x: 2)> Size: 16Barray([3, 2])Coordinates: * x (x) int64 16B 2 1

Note

If you want to interpolate along coordinates rather than looking up thenearest neighbors, use interp() andinterp_like().See interpolation for the details.

Dataset indexing#

We can also use these methods to index all variables in a datasetsimultaneously, returning a new dataset:

In [21]: da = xr.DataArray( ....:  np.random.rand(4, 3), ....:  [ ....:  ("time", pd.date_range("2000-01-01", periods=4)), ....:  ("space", ["IA", "IL", "IN"]), ....:  ], ....: ) ....: In [22]: ds = da.to_dataset(name="foo")In [23]: ds.isel(space=[0], time=[0])Out[23]: <xarray.Dataset> Size: 24BDimensions: (time: 1, space: 1)Coordinates: * time (time) datetime64[ns] 8B 2000-01-01 * space (space) <U2 8B 'IA'Data variables: foo (time, space) float64 8B 0.1294In [24]: ds.sel(time="2000-01-01")Out[24]: <xarray.Dataset> Size: 56BDimensions: (space: 3)Coordinates: time datetime64[ns] 8B 2000-01-01 * space (space) <U2 24B 'IA' 'IL' 'IN'Data variables: foo (space) float64 24B 0.1294 0.8599 0.8204

Positional indexing on a dataset is not supported because the ordering ofdimensions in a dataset is somewhat ambiguous (it can vary between differentarrays). However, you can do normal indexing with dimension names:

In [25]: ds[dict(space=[0], time=[0])]Out[25]: <xarray.Dataset> Size: 24BDimensions: (time: 1, space: 1)Coordinates: * time (time) datetime64[ns] 8B 2000-01-01 * space (space) <U2 8B 'IA'Data variables: foo (time, space) float64 8B 0.1294In [26]: ds.loc[dict(time="2000-01-01")]Out[26]: <xarray.Dataset> Size: 56BDimensions: (space: 3)Coordinates: time datetime64[ns] 8B 2000-01-01 * space (space) <U2 24B 'IA' 'IL' 'IN'Data variables: foo (space) float64 24B 0.1294 0.8599 0.8204

Dropping labels and dimensions#

The drop_sel() method returns a new object with the listedindex labels along a dimension dropped:

In [27]: ds.drop_sel(space=["IN", "IL"])Out[27]: <xarray.Dataset> Size: 72BDimensions: (time: 4, space: 1)Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 8B 'IA'Data variables: foo (time, space) float64 32B 0.1294 0.3521 0.5948 0.2355

drop_sel is both a Dataset and DataArray method.

Use drop_dims() to drop a full dimension from a Dataset.Any variables with these dimensions are also dropped:

In [28]: ds.drop_dims("time")Out[28]: <xarray.Dataset> Size: 24BDimensions: (space: 3)Coordinates: * space (space) <U2 24B 'IA' 'IL' 'IN'Data variables: *empty*

Masking with where#

Indexing methods on xarray objects generally return a subset of the original data.However, it is sometimes useful to select an object with the same shape as theoriginal data, but with some elements masked. To do this type of selection inxarray, use where():

In [29]: da = xr.DataArray(np.arange(16).reshape(4, 4), dims=["x", "y"])In [30]: da.where(da.x + da.y < 4)Out[30]: <xarray.DataArray (x: 4, y: 4)> Size: 128Barray([[ 0., 1., 2., 3.], [ 4., 5., 6., nan], [ 8., 9., nan, nan], [12., nan, nan, nan]])Dimensions without coordinates: x, y

This is particularly useful for ragged indexing of multi-dimensional data,e.g., to apply a 2D mask to an image. Note that where follows all theusual xarray broadcasting and alignment rules for binary operations (e.g.,+) between the object being indexed and the condition, as described inComputation:

In [31]: da.where(da.y < 2)Out[31]: <xarray.DataArray (x: 4, y: 4)> Size: 128Barray([[ 0., 1., nan, nan], [ 4., 5., nan, nan], [ 8., 9., nan, nan], [12., 13., nan, nan]])Dimensions without coordinates: x, y

By default where maintains the original size of the data. For caseswhere the selected data size is much smaller than the original data,use of the option drop=True clips coordinateelements that are fully masked:

In [32]: da.where(da.y < 2, drop=True)Out[32]: <xarray.DataArray (x: 4, y: 2)> Size: 64Barray([[ 0., 1.], [ 4., 5.], [ 8., 9.], [12., 13.]])Dimensions without coordinates: x, y

Selecting values with isin#

To check whether elements of an xarray object contain a single object, you cancompare with the equality operator == (e.g., arr == 3). To checkmultiple values, use isin():

In [33]: da = xr.DataArray([1, 2, 3, 4, 5], dims=["x"])In [34]: da.isin([2, 4])Out[34]: <xarray.DataArray (x: 5)> Size: 5Barray([False, True, False, True, False])Dimensions without coordinates: x

isin() works particularly well withwhere() to support indexing by arrays that are notalready labels of an array:

In [35]: lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=["x"])In [36]: da.where(lookup.isin([-2, -4]), drop=True)Out[36]: <xarray.DataArray (x: 2)> Size: 16Barray([2., 4.])Dimensions without coordinates: x

However, some caution is in order: when done repeatedly, this type of indexingis significantly slower than using sel().

Vectorized Indexing#

Like numpy and pandas, xarray supports indexing many array elements at once in avectorized manner.

If you only provide integers, slices, or unlabeled arrays (array withoutdimension names, such as np.ndarray, list, but notDataArray() or Variable()) indexing can beunderstood as orthogonally. Each indexer component selects independently alongthe corresponding dimension, similar to how vector indexing works in Fortran orMATLAB, or after using the numpy.ix_() helper:

In [37]: da = xr.DataArray( ....:  np.arange(12).reshape((3, 4)), ....:  dims=["x", "y"], ....:  coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, ....: ) ....: In [38]: daOut[38]: <xarray.DataArray (x: 3, y: 4)> Size: 96Barray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'In [39]: da[[0, 2, 2], [1, 3]]Out[39]: <xarray.DataArray (x: 3, y: 2)> Size: 48Barray([[ 1, 3], [ 9, 11], [ 9, 11]])Coordinates: * x (x) int64 24B 0 2 2 * y (y) <U1 8B 'b' 'd'

For more flexibility, you can supply DataArray() objectsas indexers.Dimensions on resultant arrays are given by the ordered union of the indexers’dimensions:

In [40]: ind_x = xr.DataArray([0, 1], dims=["x"])In [41]: ind_y = xr.DataArray([0, 1], dims=["y"])In [42]: da[ind_x, ind_y] # orthogonal indexingOut[42]: <xarray.DataArray (x: 2, y: 2)> Size: 32Barray([[0, 1], [4, 5]])Coordinates: * x (x) int64 16B 0 1 * y (y) <U1 8B 'a' 'b'

Slices or sequences/arrays without named-dimensions are treated as if they havethe same dimension which is indexed along:

# Because [0, 1] is used to index along dimension 'x',# it is assumed to have dimension 'x'In [43]: da[[0, 1], ind_x]Out[43]: <xarray.DataArray (x: 2)> Size: 16Barray([0, 5])Coordinates: * x (x) int64 16B 0 1 y (x) <U1 8B 'a' 'b'

Furthermore, you can use multi-dimensional DataArray()as indexers, where the resultant array dimension is also determined byindexers’ dimension:

In [44]: ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])In [45]: da[ind]Out[45]: <xarray.DataArray (a: 2, b: 2, y: 4)> Size: 128Barray([[[0, 1, 2, 3], [4, 5, 6, 7]], [[0, 1, 2, 3], [4, 5, 6, 7]]])Coordinates: x (a, b) int64 32B 0 1 0 1 * y (y) <U1 16B 'a' 'b' 'c' 'd'Dimensions without coordinates: a, b

Similar to how NumPy’s advanced indexing works, vectorizedindexing for xarray is based on ourbroadcasting rules.See Indexing rules for the complete specification.

Vectorized indexing also works with isel, loc, and sel:

In [46]: ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])In [47]: da.isel(y=ind) # same as da[:, ind]Out[47]: <xarray.DataArray (x: 3, a: 2, b: 2)> Size: 96Barray([[[0, 1], [0, 1]], [[4, 5], [4, 5]], [[8, 9], [8, 9]]])Coordinates: * x (x) int64 24B 0 1 2 y (a, b) <U1 16B 'a' 'b' 'a' 'b'Dimensions without coordinates: a, bIn [48]: ind = xr.DataArray([["a", "b"], ["b", "a"]], dims=["a", "b"])In [49]: da.loc[:, ind] # same as da.sel(y=ind)Out[49]: <xarray.DataArray (x: 3, a: 2, b: 2)> Size: 96Barray([[[0, 1], [1, 0]], [[4, 5], [5, 4]], [[8, 9], [9, 8]]])Coordinates: * x (x) int64 24B 0 1 2 y (a, b) <U1 16B 'a' 'b' 'b' 'a'Dimensions without coordinates: a, b

These methods may also be applied to Dataset objects

In [50]: ds = da.to_dataset(name="bar")In [51]: ds.isel(x=xr.DataArray([0, 1, 2], dims=["points"]))Out[51]: <xarray.Dataset> Size: 136BDimensions: (points: 3, y: 4)Coordinates: x (points) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'Dimensions without coordinates: pointsData variables: bar (points, y) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11

Vectorized indexing may be used to extract information from the nearestgrid cells of interest, for example, the nearest climate model grid cellsto a collection specified weather station latitudes and longitudes.To trigger vectorized indexing behavioryou will need to provide the selection dimensions with a newshared output dimension name. In the example below, the selectionsof the closest latitude and longitude are renamed to an outputdimension named “points”:

In [52]: ds = xr.tutorial.open_dataset("air_temperature")# Define target latitude and longitude (where weather stations might be)In [53]: target_lon = xr.DataArray([200, 201, 202, 205], dims="points")In [54]: target_lat = xr.DataArray([31, 41, 42, 42], dims="points")# Retrieve data at the grid cells nearest to the target latitudes and longitudesIn [55]: da = ds["air"].sel(lon=target_lon, lat=target_lat, method="nearest")In [56]: daOut[56]: <xarray.DataArray 'air' (time: 2920, points: 4)> Size: 93kB[11680 values with dtype=float64]Coordinates: lat (points) float32 16B 30.0 40.0 42.5 42.5 lon (points) float32 16B 200.0 200.0 202.5 205.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00Dimensions without coordinates: pointsAttributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]

Tip

If you are lazily loading your data from disk, not every form of vectorizedindexing is supported (or if supported, may not be supported efficiently).You may find increased performance by loading your data into memory first,e.g., with load().

Note

If an indexer is a DataArray(), its coordinates should notconflict with the selected subpart of the target array (except for theexplicitly indexed dimensions with .loc/.sel).Otherwise, IndexError will be raised.

Assigning values with indexing#

To select and assign values to a portion of a DataArray() youcan use indexing with .loc :

In [57]: ds = xr.tutorial.open_dataset("air_temperature")# add an empty 2D dataarrayIn [58]: ds["empty"] = xr.full_like(ds.air.mean("time"), fill_value=0)# modify one grid point using loc()In [59]: ds["empty"].loc[dict(lon=260, lat=30)] = 100# modify a 2D region using loc()In [60]: lc = ds.coords["lon"]In [61]: la = ds.coords["lat"]In [62]: ds["empty"].loc[ ....:  dict(lon=lc[(lc > 220) & (lc < 260)], lat=la[(la > 20) & (la < 60)]) ....: ] = 100 ....: 

or where():

# modify one grid point using xr.where()In [63]: ds["empty"] = xr.where( ....:  (ds.coords["lat"] == 20) & (ds.coords["lon"] == 260), 100, ds["empty"] ....: ) ....: # or modify a 2D region using xr.where()In [64]: mask = ( ....:  (ds.coords["lat"] > 20) ....:  & (ds.coords["lat"] < 60) ....:  & (ds.coords["lon"] > 220) ....:  & (ds.coords["lon"] < 260) ....: ) ....: In [65]: ds["empty"] = xr.where(mask, 100, ds["empty"])

Vectorized indexing can also be used to assign values to xarray object.

In [66]: da = xr.DataArray( ....:  np.arange(12).reshape((3, 4)), ....:  dims=["x", "y"], ....:  coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, ....: ) ....: In [67]: daOut[67]: <xarray.DataArray (x: 3, y: 4)> Size: 96Barray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'In [68]: da[0] = -1 # assignment with broadcastingIn [69]: daOut[69]: <xarray.DataArray (x: 3, y: 4)> Size: 96Barray([[-1, -1, -1, -1], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'In [70]: ind_x = xr.DataArray([0, 1], dims=["x"])In [71]: ind_y = xr.DataArray([0, 1], dims=["y"])In [72]: da[ind_x, ind_y] = -2 # assign -2 to (ix, iy) = (0, 0) and (1, 1)In [73]: daOut[73]: <xarray.DataArray (x: 3, y: 4)> Size: 96Barray([[-2, -2, -1, -1], [-2, -2, 6, 7], [ 8, 9, 10, 11]])Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'In [74]: da[ind_x, ind_y] += 100 # increment is also possibleIn [75]: daOut[75]: <xarray.DataArray (x: 3, y: 4)> Size: 96Barray([[98, 98, -1, -1], [98, 98, 6, 7], [ 8, 9, 10, 11]])Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'

Like numpy.ndarray, value assignment sometimes works differently from what one may expect.

In [76]: da = xr.DataArray([0, 1, 2, 3], dims=["x"])In [77]: ind = xr.DataArray([0, 0, 0], dims=["x"])In [78]: da[ind] -= 1In [79]: daOut[79]: <xarray.DataArray (x: 4)> Size: 32Barray([-1, 1, 2, 3])Dimensions without coordinates: x

Where the 0th element will be subtracted 1 only once.This is because v[0] = v[0] - 1 is called three times, rather thanv[0] = v[0] - 1 - 1 - 1.See Assigning values to indexed arrays for the details.

Note

Dask array does not support value assignment(see Parallel computing with Dask for the details).

Note

Coordinates in both the left- and right-hand-side arrays should notconflict with each other.Otherwise, IndexError will be raised.

Warning

Do not try to assign values when using any of the indexing methods iselor sel:

# DO NOT do thisda.isel(space=0) = 0

Instead, values can be assigned using dictionary-based indexing:

da[dict(space=0)] = 0

Assigning values with the chained indexing using .sel or .isel fails silently.

In [80]: da = xr.DataArray([0, 1, 2, 3], dims=["x"])# DO NOT do thisIn [81]: da.isel(x=[0, 1, 2])[1] = -1In [82]: daOut[82]: <xarray.DataArray (x: 4)> Size: 32Barray([0, 1, 2, 3])Dimensions without coordinates: x

You can also assign values to all variables of a Dataset at once:

In [83]: ds_org = xr.tutorial.open_dataset("eraint_uvz").isel( ....:  latitude=slice(56, 59), longitude=slice(255, 258), level=0 ....: ) ....: # set all values to 0In [84]: ds = xr.zeros_like(ds_org)In [85]: dsOut[85]: <xarray.Dataset> Size: 468BDimensions: (month: 2, latitude: 3, longitude: 3)Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...# by integerIn [86]: ds[dict(latitude=2, longitude=2)] = 1In [87]: ds["u"]Out[87]: <xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144Barray([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]])Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_windIn [88]: ds["v"]Out[88]: <xarray.DataArray 'v' (month: 2, latitude: 3, longitude: 3)> Size: 144Barray([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]])Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: V component of wind standard_name: northward_wind# by labelIn [89]: ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = 100In [90]: ds["u"]Out[90]: <xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144Barray([[[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]], [[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]]])Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_wind# dataset as new valuesIn [91]: new_dat = ds_org.loc[dict(latitude=48, longitude=[11.25, 12])]In [92]: new_datOut[92]: <xarray.Dataset> Size: 120BDimensions: (longitude: 2, month: 2)Coordinates: * longitude (longitude) float32 8B 11.25 12.0 latitude float32 4B 48.0 level int32 4B 200 * month (month) int32 8B 1 7Data variables: z (month, longitude) float64 32B 1.136e+05 1.136e+05 ... 1.187e+05 u (month, longitude) float64 32B 12.75 12.69 14.87 14.62 v (month, longitude) float64 32B -7.891 -7.781 -1.875 -1.984Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...In [93]: ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = new_datIn [94]: ds["u"]Out[94]: <xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144Barray([[[ 0. , 0. , 0. ], [12.75 , 12.687, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.875, 14.625, 0. ], [ 0. , 0. , 1. ]]])Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_wind

The dimensions can differ between the variables in the dataset, but all variables need to have at least the dimensions specified in the indexer dictionary.The new values must be either a scalar, a DataArray or a Dataset itself that contains all variables that also appear in the dataset to be modified.

More advanced indexing#

The use of DataArray() objects as indexers enables veryflexible indexing. The following is an example of the pointwise indexing:

In [95]: da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=["x", "y"])In [96]: daOut[96]: <xarray.DataArray (x: 7, y: 8)> Size: 448Barray([[ 0, 1, 2, ..., 5, 6, 7], [ 8, 9, 10, ..., 13, 14, 15], [16, 17, 18, ..., 21, 22, 23], ..., [32, 33, 34, ..., 37, 38, 39], [40, 41, 42, ..., 45, 46, 47], [48, 49, 50, ..., 53, 54, 55]])Dimensions without coordinates: x, yIn [97]: da.isel(x=xr.DataArray([0, 1, 6], dims="z"), y=xr.DataArray([0, 1, 0], dims="z"))Out[97]: <xarray.DataArray (z: 3)> Size: 24Barray([ 0, 9, 48])Dimensions without coordinates: z

where three elements at (ix, iy) = ((0, 0), (1, 1), (6, 0)) are selectedand mapped along a new dimension z.

If you want to add a coordinate to the new dimension z,you can supply a DataArray with a coordinate,

In [98]: da.isel( ....:  x=xr.DataArray([0, 1, 6], dims="z", coords={"z": ["a", "b", "c"]}), ....:  y=xr.DataArray([0, 1, 0], dims="z"), ....: ) ....: Out[98]: <xarray.DataArray (z: 3)> Size: 24Barray([ 0, 9, 48])Coordinates: * z (z) <U1 12B 'a' 'b' 'c'

Analogously, label-based pointwise-indexing is also possible by the .selmethod:

In [99]: da = xr.DataArray( ....:  np.random.rand(4, 3), ....:  [ ....:  ("time", pd.date_range("2000-01-01", periods=4)), ....:  ("space", ["IA", "IL", "IN"]), ....:  ], ....: ) ....: In [100]: times = xr.DataArray( .....:  pd.to_datetime(["2000-01-03", "2000-01-02", "2000-01-01"]), dims="new_time" .....: ) .....: In [101]: da.sel(space=xr.DataArray(["IA", "IL", "IN"], dims=["new_time"]), time=times)Out[101]: <xarray.DataArray (new_time: 3)> Size: 24Barray([0.92, 0.34, 0.59])Coordinates: time (new_time) datetime64[ns] 24B 2000-01-03 2000-01-02 2000-01-01 space (new_time) <U2 24B 'IA' 'IL' 'IN' * new_time (new_time) datetime64[ns] 24B 2000-01-03 2000-01-02 2000-01-01

Align and reindex#

Xarray’s reindex, reindex_like and align impose a DataArray orDataset onto a new set of coordinates corresponding to dimensions. Theoriginal values are subset to the index labels still found in the new labels,and values corresponding to new labels not found in the original object arein-filled with NaN.

Xarray operations that combine multiple objects generally automatically aligntheir arguments to share the same indexes. However, manual alignment can beuseful for greater control and for increased performance.

To reindex a particular dimension, use reindex():

In [102]: da.reindex(space=["IA", "CA"])Out[102]: <xarray.DataArray (time: 4, space: 2)> Size: 64Barray([[0.574, nan], [0.245, nan], [0.92 , nan], [0.754, nan]])Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 16B 'IA' 'CA'

The reindex_like() method is a useful shortcut.To demonstrate, we will make a subset DataArray with new values:

In [103]: foo = da.rename("foo")In [104]: baz = (10 * da[:2, :2]).rename("baz")In [105]: bazOut[105]: <xarray.DataArray 'baz' (time: 2, space: 2)> Size: 32Barray([[5.74 , 0.613], [2.453, 3.404]])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 16B 'IA' 'IL'

Reindexing foo with baz selects out the first two values along eachdimension:

In [106]: foo.reindex_like(baz)Out[106]: <xarray.DataArray 'foo' (time: 2, space: 2)> Size: 32Barray([[0.574, 0.061], [0.245, 0.34 ]])Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 16B 'IA' 'IL'

The opposite operation asks us to reindex to a larger shape, so we fill inthe missing values with NaN:

In [107]: baz.reindex_like(foo)Out[107]: <xarray.DataArray 'baz' (time: 4, space: 3)> Size: 96Barray([[5.74 , 0.613, nan], [2.453, 3.404, nan], [ nan, nan, nan], [ nan, nan, nan]])Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 24B 'IA' 'IL' 'IN'

The align() function lets us perform more flexible database-like'inner', 'outer', 'left' and 'right' joins:

In [108]: xr.align(foo, baz, join="inner")Out[108]: (<xarray.DataArray 'foo' (time: 2, space: 2)> Size: 32B array([[0.574, 0.061], [0.245, 0.34 ]]) Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 16B 'IA' 'IL', <xarray.DataArray 'baz' (time: 2, space: 2)> Size: 32B array([[5.74 , 0.613], [2.453, 3.404]]) Coordinates: * time (time) datetime64[ns] 16B 2000-01-01 2000-01-02 * space (space) <U2 16B 'IA' 'IL')In [109]: xr.align(foo, baz, join="outer")Out[109]: (<xarray.DataArray 'foo' (time: 4, space: 3)> Size: 96B array([[0.574, 0.061, 0.59 ], [0.245, 0.34 , 0.985], [0.92 , 0.038, 0.862], [0.754, 0.405, 0.344]]) Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 24B 'IA' 'IL' 'IN', <xarray.DataArray 'baz' (time: 4, space: 3)> Size: 96B array([[5.74 , 0.613, nan], [2.453, 3.404, nan], [ nan, nan, nan], [ nan, nan, nan]]) Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 24B 'IA' 'IL' 'IN')

Both reindex_like and align work interchangeably betweenDataArray and Dataset objects, and with any number of matching dimension names:

In [110]: dsOut[110]: <xarray.Dataset> Size: 468BDimensions: (month: 2, latitude: 3, longitude: 3)Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...In [111]: ds.reindex_like(baz)Out[111]: <xarray.Dataset> Size: 468BDimensions: (month: 2, latitude: 3, longitude: 3)Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...In [112]: other = xr.DataArray(["a", "b", "c"], dims="other")# this is a no-op, because there are no shared dimension namesIn [113]: ds.reindex_like(other)Out[113]: <xarray.Dataset> Size: 468BDimensions: (month: 2, latitude: 3, longitude: 3)Coordinates: * longitude (longitude) float32 12B 11.25 12.0 12.75 * latitude (latitude) float32 12B 48.0 47.25 46.5 level int32 4B 200 * month (month) int32 8B 1 7Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...

Missing coordinate labels#

Coordinate labels for each dimension are optional (as of xarray v0.9). Labelbased indexing with .sel and .loc uses standard positional,integer-based indexing as a fallback for dimensions without a coordinate label:

In [114]: da = xr.DataArray([1, 2, 3], dims="x")In [115]: da.sel(x=[0, -1])Out[115]: <xarray.DataArray (x: 2)> Size: 16Barray([1, 3])Dimensions without coordinates: x

Alignment between xarray objects where one or both do not have coordinate labelssucceeds only if all dimensions of the same name have the same length.Otherwise, it raises an informative error:

In [116]: xr.align(da, da[:2])ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {2, 3}

Underlying Indexes#

Xarray uses the pandas.Index internally to perform indexingoperations. If you need to access the underlying indexes, they are availablethrough the indexes attribute.

In [117]: da = xr.DataArray( .....:  np.random.rand(4, 3), .....:  [ .....:  ("time", pd.date_range("2000-01-01", periods=4)), .....:  ("space", ["IA", "IL", "IN"]), .....:  ], .....: ) .....: In [118]: daOut[118]: <xarray.DataArray (time: 4, space: 3)> Size: 96Barray([[0.171, 0.395, 0.642], [0.275, 0.462, 0.871], [0.401, 0.611, 0.118], [0.702, 0.414, 0.342]])Coordinates: * time (time) datetime64[ns] 32B 2000-01-01 2000-01-02 ... 2000-01-04 * space (space) <U2 24B 'IA' 'IL' 'IN'In [119]: da.indexesOut[119]: Indexes: time DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04'], dtype='datetime64[ns]', name='time', freq='D') space Index(['IA', 'IL', 'IN'], dtype='object', name='space')In [120]: da.indexes["time"]Out[120]: DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04'], dtype='datetime64[ns]', name='time', freq='D')

Use get_index() to get an index for a dimension,falling back to a default pandas.RangeIndex if it has no coordinatelabels:

In [121]: da = xr.DataArray([1, 2, 3], dims="x")In [122]: daOut[122]: <xarray.DataArray (x: 3)> Size: 24Barray([1, 2, 3])Dimensions without coordinates: xIn [123]: da.get_index("x")Out[123]: RangeIndex(start=0, stop=3, step=1, name='x')

Copies vs. Views#

Whether array indexing returns a view or a copy of the underlyingdata depends on the nature of the labels.

For positional (integer)indexing, xarray follows the same rules as NumPy:

  • Positional indexing with only integers and slices returns a view.

  • Positional indexing with arrays or lists returns a copy.

The rules for label based indexing are more complex:

  • Label-based indexing with only slices returns a view.

  • Label-based indexing with arrays returns a copy.

  • Label-based indexing with scalars returns a view or a copy, dependingupon if the corresponding positional indexer can be represented as aninteger or a slice object. The exact rules are determined by pandas.

Whether data is a copy or a view is more predictable in xarray than in pandas, sounlike pandas, xarray does not produce SettingWithCopy warnings. However, youshould still avoid assignment with chained indexing.

Note that other operations (such as values()) may also return views rather than copies.

Multi-level indexing#

Just like pandas, advanced indexing on multi-level indexes is possible withloc and sel. You can slice a multi-index by providing multiple indexers,i.e., a tuple of slices, labels, list of labels, or any selector allowed bypandas:

In [124]: midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("one", "two"))In [125]: mda = xr.DataArray(np.random.rand(6, 3), [("x", midx), ("y", range(3))])In [126]: mdaOut[126]: <xarray.DataArray (x: 6, y: 3)> Size: 144Barray([[0.596, 0.2 , 0.1 ], [0.735, 0.017, 0.481], [0.096, 0.497, 0.839], [0.897, 0.733, 0.759], [0.561, 0.471, 0.139], [0.094, 0.942, 0.134]])Coordinates: * x (x) object 48B MultiIndex * one (x) object 48B 'a' 'a' 'b' 'b' 'c' 'c' * two (x) int64 48B 0 1 0 1 0 1 * y (y) int64 24B 0 1 2In [127]: mda.sel(x=(list("ab"), [0]))Out[127]: <xarray.DataArray (x: 2, y: 3)> Size: 48Barray([[0.596, 0.2 , 0.1 ], [0.096, 0.497, 0.839]])Coordinates: * x (x) object 16B MultiIndex * one (x) object 16B 'a' 'b' * two (x) int64 16B 0 0 * y (y) int64 24B 0 1 2

You can also select multiple elements by providing a list of labels or tuples ora slice of tuples:

In [128]: mda.sel(x=[("a", 0), ("b", 1)])Out[128]: <xarray.DataArray (x: 2, y: 3)> Size: 48Barray([[0.596, 0.2 , 0.1 ], [0.897, 0.733, 0.759]])Coordinates: * x (x) object 16B MultiIndex * one (x) object 16B 'a' 'b' * two (x) int64 16B 0 1 * y (y) int64 24B 0 1 2

Additionally, xarray supports dictionaries:

In [129]: mda.sel(x={"one": "a", "two": 0})Out[129]: <xarray.DataArray (y: 3)> Size: 24Barray([0.596, 0.2 , 0.1 ])Coordinates: x object 8B ('a', 0) one <U1 4B 'a' two int64 8B 0 * y (y) int64 24B 0 1 2

For convenience, sel also accepts multi-index levels directlyas keyword arguments:

In [130]: mda.sel(one="a", two=0)Out[130]: <xarray.DataArray (y: 3)> Size: 24Barray([0.596, 0.2 , 0.1 ])Coordinates: x object 8B ('a', 0) one <U1 4B 'a' two int64 8B 0 * y (y) int64 24B 0 1 2

Note that using sel it is not possible to mix a dimensionindexer with level indexers for that dimension(e.g., mda.sel(x={'one': 'a'}, two=0) will raise a ValueError).

Like pandas, xarray handles partial selection on multi-index (level drop).As shown below, it also renames the dimension / coordinate when themulti-index is reduced to a single index.

In [131]: mda.loc[{"one": "a"}, ...]Out[131]: <xarray.DataArray (two: 2, y: 3)> Size: 48Barray([[0.596, 0.2 , 0.1 ], [0.735, 0.017, 0.481]])Coordinates: * two (two) int64 16B 0 1 * y (y) int64 24B 0 1 2 one <U1 4B 'a'

Unlike pandas, xarray does not guess whether you provide index levels ordimensions when using loc in some ambiguous cases. For example, formda.loc[{'one': 'a', 'two': 0}] and mda.loc['a', 0] xarrayalways interprets (‘one’, ‘two’) and (‘a’, 0) as the names andlabels of the 1st and 2nd dimension, respectively. You must specify alldimensions or use the ellipsis in the loc specifier, e.g. in the exampleabove, mda.loc[{'one': 'a', 'two': 0}, :] or mda.loc[('a', 0), ...].

Indexing rules#

Here we describe the full rules xarray uses for vectorized indexing. Note thatthis is for the purposes of explanation: for the sake of efficiency and tosupport various backends, the actual implementation is different.

  1. (Only for label based indexing.) Look up positional indexes along eachdimension from the corresponding pandas.Index.

  2. A full slice object : is inserted for each dimension without an indexer.

  3. slice objects are converted into arrays, given bynp.arange(*slice.indices(...)).

  4. Assume dimension names for array indexers without dimensions, such asnp.ndarray and list, from the dimensions to be indexed along.For example, v.isel(x=[0, 1]) is understood asv.isel(x=xr.DataArray([0, 1], dims=['x'])).

  5. For each variable in a Dataset or DataArray (the array and itscoordinates):

    1. Broadcast all relevant indexers based on their dimension names(see Broadcasting by dimension name for full details).

    2. Index the underling array by the broadcast indexers, using NumPy’sadvanced indexing rules.

  6. If any indexer DataArray has coordinates and no coordinate with thesame name exists, attach them to the indexed object.

Note

Only 1-dimensional boolean arrays can be used as indexers.

Indexing and selecting data (2024)
Top Articles
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6272

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.