Response functions response-functions/std#

Description#

Tutorial: How to run response functions

Compute the explicit Jacobian \(\mathbf{H}\) of the observation operator column by column (response functions / base functions), and optionally perform a direct analytical inversion.

Mathematical framework#

Building H — response functions#

The observation operator \(\mathcal{H}\) is assumed to be linear (or linearised around \(\mathbf{x}_b\) when run_mode = 'tl'). Its Jacobian \(\mathbf{H} \in \mathbb{R}^{m \times n}\) is assembled as:

\[\mathbf{H}_{:,\,i} = \mathcal{H}(\mathbf{e}_i), \quad i = 1, \ldots, n\]

where \(\mathbf{e}_i\) is the \(i\)-th canonical basis vector (Dirac flux: all zeros except element \(i\) set to 1). Each column is obtained by one independent forward (or TL) pyCIF simulation stored under $workdir/base_functions/.

By linearity, the full simulated observation vector is recovered as:

\[\mathbf{y}_\mathrm{sim} = \mathbf{H}\,\mathbf{x} = \sum_{i=1}^{n} x_i\,\mathbf{H}_{:,\,i}\]

Run mode#

  • run_mode = 'tl' (default) — evaluates the tangent-linear operator; a reference forward run \(\mathcal{H}(\mathbf{x}_b)\) is performed first and stored in the sim column of the obs-vector. The TL response is stored in sim_tl.

  • run_mode = 'fwd' — evaluates the full non-linear operator for each basis vector; no reference run is needed.

Analytical inversion (optional)#

When analytical_inversion = True, a BLUE posterior is computed from the assembled \(\mathbf{H}\) matrix:

\[\mathbf{x}_a = \mathbf{x}_b + \mathbf{K}\bigl(\mathbf{y} - \mathbf{H}\mathbf{x}_b\bigr), \qquad \mathbf{K} = \mathbf{B}\mathbf{H}^\top \bigl(\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top\bigr)^{-1}\]

By default the matrix inversion uses the standard formula \((\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top)^{-1}\), which is \(\mathcal{O}(m^3)\). When \(n \ll m\), the Woodbury identity

\[(\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top)^{-1} = \mathbf{R}^{-1} - \mathbf{R}^{-1}\mathbf{H} (\mathbf{B}^{-1} + \mathbf{H}^\top\mathbf{R}^{-1}\mathbf{H})^{-1} \mathbf{H}^\top\mathbf{R}^{-1}\]

reduces the dominant cost to \(\mathcal{O}(n^3)\) and can be enabled with use_woodbury_identity = True (or 'auto').

Outputs#

Observation vector#

The aggregated simulated observation vector \(\mathbf{y} = \mathbf{H}\,\mathbf{x}_b\) is stored in $workdir/obsvect/. The column filled depends on run_mode: sim for 'fwd', sim_tl for 'tl'.

\(\mathbf{H}\) matrix#

The full \(\mathbf{H}\) matrix \(= (\mathbf{y}_1^\top, \ldots, \mathbf{y}_n^\top)\) is stored in $workdir/h_matrix.nc.

Per-parameter decompositions (sub-matrices of \(\mathbf{H}\) reshaped to the physical dimensions of each tracer) are stored in $workdir/base_functions/decomposition/.

Restartability#

Note

Response functions that did not produce output are automatically re-run on restart — only missing or failed simulations are resubmitted. See the autorestart argument of the obsoperator plugin for details.

Warning

One simulation per control-vector dimension \(n\) is required. Check \(n\) and the cost of one forward run before launching. Use the dryrun option to estimate total wall-clock time.

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

dryrun : bool, optional, default False

Submit only the first response function to estimate per-run cost, then stop without building the full H matrix.

run_mode : “fwd” or “tl”, optional, default “tl”

Operator evaluated for each response function. 'tl' requires a reference forward run (set automatically); see also use_model_approximation.

autoflush : bool, optional, default False

Delete temporary files not already removed by the model plugin’s flushrun method after each response function.

reload_results : bool, optional, default True

Reload response functions results from previous simulations. If set to true already computed simulations will not be run. Affect both the eventual reference forward simulation and the response functions simulations.

reload_h_matrix : “str or list of str”, optional

Reload the H matrix from previous simulations. If this argument is used, the computation of the response functions will be skipped and the H matrix will be read from the provided path(s). If multiple paths are provided, the H matrices will be summed.

clamp_h_matrix_to_zero : bool, optional, default True

Set negative H matrix elements to zero (enforces physical non-negativity).

analytical_inversion : bool, optional, default False

After building H, perform a direct BLUE inversion \(x_a = x_b + K(y - Hx_b)\). See module docstring for equations.

use_woodbury_identity : “bool or ‘auto’”, optional, default “auto”

Use Woodbury matrix identity to compute the inverse of \(\left( \mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^T \right)\). Decreases computation time significantly when \(\mathrm{dim}(\mathbf{R}) \gg \mathrm{dim}(\mathbf{B})\) (significantly increase the computation time otherwise). When this option is set to “auto” the method used is chosen according to \(\mathbf{R}\) and \(\mathbf{B}\) dimensions

full_period : bool, optional, default False

Run the response functions over the whole simulation windows. This argument cannot be set to true if the ‘spin_down’ arguments is used.

spin_down : str, optional

Extra simulation period appended after the control vector window to allow the model to reach the observations. Valid pandas frequency alias (e.g. '7D', '1MS'). Can be set globally here or per-tracer in the datavect (tracer-level value takes priority).

first_period_only : bool, optional, default False

Run only the response functions for the first control-vector period. Useful to estimate the per-period cost or the model relaxation time. Incompatible with full_period = True.

inicond_component : str, optional, default “inicond”

Initial conditions datavect component name

ignore_tracers : list, optional

List of [component, parameter] pairs to skip. Their H matrix columns are set to zero.

job_batch_size : int, optional, default 20

Number of response functions to submit before waiting for the batch to finish. 0 or negative submits all at once. Set to 1 to trigger file cleanup after every job.

pseudo_parallel_job : bool, optional, default False

Submit each batch as a single job script that launches all members in the background (cmd &wait). Requires job_batch_size > 0.

use_batch_sampling : bool, optional, default False

Group response functions by time period and run them via the obsoperator batch_computation mode instead of individual jobs.

batch_sampling_size : int, optional

Maximum number of response functions per batch (only used when use_batch_sampling = True).

separate_parameters : bool, optional, default False

Split batches by observation parameter (species). Only valid with use_batch_sampling = True.

independant_parameters : bool, optional, default False

Treat species as independent: each response function affects only its own tracer (and downstream transform outputs). Reduces batch size. Only valid with use_batch_sampling = True.

use_model_approximation : bool, optional, default False

Reuse the linearisation point from a reference forward run instead of re-running the full non-linear model for each response function. Requires run_mode = 'tl'.

run_reference_forward : bool, optional, default False

Run a reference forward \(\mathcal{H}(x_b)\) and store the result in the sim obs-vector column. Set automatically when run_mode = 'tl'.

dump_sparse_arrays : bool, optional, default False

Store H matrix NetCDF files using COO sparse arrays to save disk space. Convert back with pycif.utils.sparse_array.to_dense_dataset.

dump_obsvect_decompostion : bool, optional, default False

Dump the observation vector decomposed by response function (one column per control-vector element).

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name

Requirement type

Explicit definition

Any valid

Default name

Default version

platform

Platform

True

True

None

None

model

Model

False

True

None

None

obsoperator

ObsOperator

True

True

standard

std

obsvect

ObsVect

False

True

standard

std

controlvect

ControlVect

True

True

standard

std

datavect

DataVect

True

True

standard

std

YAML template#

Please find below a template for a YAML configuration:

 1mode:
 2  plugin:
 3    name: response-functions
 4    version: std
 5    type: mode
 6
 7  # Optional arguments
 8  dryrun: XXXXX  # bool
 9  run_mode: XXXXX  # fwd|tl
10  autoflush: XXXXX  # bool
11  reload_results: XXXXX  # bool
12  reload_h_matrix: XXXXX  # str or list of str
13  clamp_h_matrix_to_zero: XXXXX  # bool
14  analytical_inversion: XXXXX  # bool
15  use_woodbury_identity: XXXXX  # bool or 'auto'
16  full_period: XXXXX  # bool
17  spin_down: XXXXX  # str
18  first_period_only: XXXXX  # bool
19  inicond_component: XXXXX  # str
20  ignore_tracers: XXXXX  # list
21  job_batch_size: XXXXX  # int
22  pseudo_parallel_job: XXXXX  # bool
23  use_batch_sampling: XXXXX  # bool
24  batch_sampling_size: XXXXX  # int
25  separate_parameters: XXXXX  # bool
26  independant_parameters: XXXXX  # bool
27  use_model_approximation: XXXXX  # bool
28  run_reference_forward: XXXXX  # bool
29  dump_sparse_arrays: XXXXX  # bool
30  dump_obsvect_decompostion: XXXXX  # bool