Temporal interpolation and re-indexing time_interpolation/std

Temporal interpolation and re-indexing time_interpolation/std#

Description#

time_interpolation transform: re-index data from one time grid to another.

Interpolates or resamples gridded (xarray) and observation-indexed (sparse / sampled pandas DataFrame) data to match the temporal resolution required by the succeeding transform in the pipeline.

Two data shapes are handled:

  • Array data (sparse_in = False, sparse_out = False) — an xarray DataArray of shape (time, lev, lat, lon) is resampled, with duration-based weights.

  • Sparse / sampled data (sparse_out = True or sampled_out = True) — a pandas DataFrame indexed by observation is resampled to the new date window. When recombine_periods = True, observations overlapping multiple sub-simulation periods are combined proportionally.

Temporal interpolation indexes are pre-computed in ini_mapper() via calc_indexes() and cached in the mapper so that the forward and adjoint passes do not recompute them.

Ensemble (batch sampling) runs are supported: multiple __sample#N tracers are processed in parallel with nthreads threads (defaulting to the number of available CPUs).

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Mandatory arguments#

method : “linear”, mandatory

Method by which the original data is temporally interpolated onto the output time-scale

Optional arguments#

parameter : str, optional

Parameter name on which the transform works on

component : str, optional

Component name on which the transform works on

orig_parameter_plg : Plugin, optional

Plugin object on which the transform works on

orig_component_plg : Plugin, optional

Corresponding component object on which the transform works on

successor : str, optional

Name of the successor transform

precursor : str, optional

Name of the precursor transform

recombine_periods : str, optional, default True

Recombine inputs from different sub-periods. If False, data overlapping several periods will be taken from the period with the biggest overlap with the outputs

sparse_in : bool, optional, default False

Set to True when the input data is a pandas DataFrame (observation-indexed sparse format) rather than a gridded xarray DataArray.

sparse_out : bool, optional, default False

Set to True when the output should be a pandas DataFrame (observation-indexed sparse format).

sampled_in : bool, optional, default False

Set to True when the input is already sampled at observation locations (i.e. the sampled flag is set in the preceding transform’s mapper).

sampled_out : bool, optional, default False

Set to True when the output should be delivered as observation-sampled data.

nthreads : int, optional, default 1

Number of parallel threads for ensemble (batch sampling) processing. Defaults to the number of available CPUs.

debug_crop : int, optional, default 10000

Maximum number of dates to print in debug log messages. Raise to see the full date list; lower to keep logs readable.

YAML template#

Please find below a template for a YAML configuration:

 1transform:
 2  plugin:
 3    name: time_interpolation
 4    version: std
 5    type: transform
 6
 7  # Mandatory arguments
 8  method: XXXXX  # linear
 9
10  # Optional arguments
11  parameter: XXXXX  # str
12  component: XXXXX  # str
13  orig_parameter_plg: XXXXX  # Plugin
14  orig_component_plg: XXXXX  # Plugin
15  successor: XXXXX  # str
16  precursor: XXXXX  # str
17  recombine_periods: XXXXX  # str
18  sparse_in: XXXXX  # bool
19  sparse_out: XXXXX  # bool
20  sampled_in: XXXXX  # bool
21  sampled_out: XXXXX  # bool
22  nthreads: XXXXX  # int
23  debug_crop: XXXXX  # int