Response functions response-functions/std#
Description#
Tutorial: How to run response functions
Compute the explicit Jacobian \(\mathbf{H}\) of the observation operator column by column (response functions / base functions), and optionally perform a direct analytical inversion.
Mathematical framework#
Building H — response functions#
The observation operator \(\mathcal{H}\) is assumed to be linear (or
linearised around \(\mathbf{x}_b\) when run_mode = 'tl'). Its
Jacobian \(\mathbf{H} \in \mathbb{R}^{m \times n}\) is assembled as:
where \(\mathbf{e}_i\) is the \(i\)-th canonical basis vector (Dirac
flux: all zeros except element \(i\) set to 1). Each column is obtained
by one independent forward (or TL) pyCIF simulation stored under
$workdir/base_functions/.
By linearity, the full simulated observation vector is recovered as:
Run mode#
run_mode = 'tl'(default) — evaluates the tangent-linear operator; a reference forward run \(\mathcal{H}(\mathbf{x}_b)\) is performed first and stored in thesimcolumn of the obs-vector. The TL response is stored insim_tl.run_mode = 'fwd'— evaluates the full non-linear operator for each basis vector; no reference run is needed.
Analytical inversion (optional)#
When analytical_inversion = True, a BLUE posterior is computed from the
assembled \(\mathbf{H}\) matrix:
By default the matrix inversion uses the standard formula \((\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top)^{-1}\), which is \(\mathcal{O}(m^3)\). When \(n \ll m\), the Woodbury identity
reduces the dominant cost to \(\mathcal{O}(n^3)\) and can be enabled
with use_woodbury_identity = True (or 'auto').
Outputs#
Observation vector#
The aggregated simulated observation vector
\(\mathbf{y} = \mathbf{H}\,\mathbf{x}_b\) is stored in
$workdir/obsvect/. The column filled depends on run_mode:
sim for 'fwd', sim_tl for 'tl'.
\(\mathbf{H}\) matrix#
The full \(\mathbf{H}\) matrix
\(= (\mathbf{y}_1^\top, \ldots, \mathbf{y}_n^\top)\)
is stored in $workdir/h_matrix.nc.
Per-parameter decompositions (sub-matrices of \(\mathbf{H}\) reshaped
to the physical dimensions of each tracer) are stored in
$workdir/base_functions/decomposition/.
Restartability#
Note
Response functions that did not produce output are automatically re-run
on restart — only missing or failed simulations are resubmitted. See the
autorestart argument of the obsoperator plugin for details.
Warning
One simulation per control-vector dimension \(n\) is required.
Check \(n\) and the cost of one forward run before launching.
Use the dryrun option to estimate total wall-clock time.
YAML arguments#
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
- dryrun : bool, optional, default False
Submit only the first response function to estimate per-run cost, then stop without building the full H matrix.
- run_mode : “fwd” or “tl”, optional, default “tl”
Operator evaluated for each response function.
'tl'requires a reference forward run (set automatically); see alsouse_model_approximation.
- autoflush : bool, optional, default False
Delete temporary files not already removed by the model plugin’s
flushrunmethod after each response function.
- reload_results : bool, optional, default True
Reload response functions results from previous simulations. If set to
truealready computed simulations will not be run. Affect both the eventual reference forward simulation and the response functions simulations.
- reload_h_matrix : “str or list of str”, optional
Reload the H matrix from previous simulations. If this argument is used, the computation of the response functions will be skipped and the H matrix will be read from the provided path(s). If multiple paths are provided, the H matrices will be summed.
- clamp_h_matrix_to_zero : bool, optional, default True
Set negative H matrix elements to zero (enforces physical non-negativity).
- analytical_inversion : bool, optional, default False
After building H, perform a direct BLUE inversion \(x_a = x_b + K(y - Hx_b)\). See module docstring for equations.
- use_woodbury_identity : “bool or ‘auto’”, optional, default “auto”
Use Woodbury matrix identity to compute the inverse of \(\left( \mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^T \right)\). Decreases computation time significantly when \(\mathrm{dim}(\mathbf{R}) \gg \mathrm{dim}(\mathbf{B})\) (significantly increase the computation time otherwise). When this option is set to “auto” the method used is chosen according to \(\mathbf{R}\) and \(\mathbf{B}\) dimensions
- full_period : bool, optional, default False
Run the response functions over the whole simulation windows. This argument cannot be set to
trueif the ‘spin_down’ arguments is used.
- spin_down : str, optional
Extra simulation period appended after the control vector window to allow the model to reach the observations. Valid pandas frequency alias (e.g.
'7D','1MS'). Can be set globally here or per-tracer in the datavect (tracer-level value takes priority).
- first_period_only : bool, optional, default False
Run only the response functions for the first control-vector period. Useful to estimate the per-period cost or the model relaxation time. Incompatible with
full_period = True.
- inicond_component : str, optional, default “inicond”
Initial conditions datavect component name
- ignore_tracers : list, optional
List of
[component, parameter]pairs to skip. Their H matrix columns are set to zero.
- job_batch_size : int, optional, default 20
Number of response functions to submit before waiting for the batch to finish.
0or negative submits all at once. Set to1to trigger file cleanup after every job.
- pseudo_parallel_job : bool, optional, default False
Submit each batch as a single job script that launches all members in the background (
cmd &…wait). Requiresjob_batch_size > 0.
- use_batch_sampling : bool, optional, default False
Group response functions by time period and run them via the obsoperator
batch_computationmode instead of individual jobs.
- batch_sampling_size : int, optional
Maximum number of response functions per batch (only used when
use_batch_sampling = True).
- separate_parameters : bool, optional, default False
Split batches by observation parameter (species). Only valid with
use_batch_sampling = True.
- independant_parameters : bool, optional, default False
Treat species as independent: each response function affects only its own tracer (and downstream transform outputs). Reduces batch size. Only valid with
use_batch_sampling = True.
- use_model_approximation : bool, optional, default False
Reuse the linearisation point from a reference forward run instead of re-running the full non-linear model for each response function. Requires
run_mode = 'tl'.
- run_reference_forward : bool, optional, default False
Run a reference forward \(\mathcal{H}(x_b)\) and store the result in the
simobs-vector column. Set automatically whenrun_mode = 'tl'.
- dump_sparse_arrays : bool, optional, default False
Store H matrix NetCDF files using COO sparse arrays to save disk space. Convert back with
pycif.utils.sparse_array.to_dense_dataset.
- dump_obsvect_decompostion : bool, optional, default False
Dump the observation vector decomposed by response function (one column per control-vector element).
Requirements#
The current plugin requires the present plugins to run properly:
Requirement name |
Requirement type |
Explicit definition |
Any valid |
Default name |
Default version |
|---|---|---|---|---|---|
platform |
True |
True |
None |
None |
|
model |
False |
True |
None |
None |
|
obsoperator |
True |
True |
standard |
std |
|
obsvect |
False |
True |
standard |
std |
|
controlvect |
True |
True |
standard |
std |
|
datavect |
True |
True |
standard |
std |
YAML template#
Please find below a template for a YAML configuration:
1mode:
2 plugin:
3 name: response-functions
4 version: std
5 type: mode
6
7 # Optional arguments
8 dryrun: XXXXX # bool
9 run_mode: XXXXX # fwd|tl
10 autoflush: XXXXX # bool
11 reload_results: XXXXX # bool
12 reload_h_matrix: XXXXX # str or list of str
13 clamp_h_matrix_to_zero: XXXXX # bool
14 analytical_inversion: XXXXX # bool
15 use_woodbury_identity: XXXXX # bool or 'auto'
16 full_period: XXXXX # bool
17 spin_down: XXXXX # str
18 first_period_only: XXXXX # bool
19 inicond_component: XXXXX # str
20 ignore_tracers: XXXXX # list
21 job_batch_size: XXXXX # int
22 pseudo_parallel_job: XXXXX # bool
23 use_batch_sampling: XXXXX # bool
24 batch_sampling_size: XXXXX # int
25 separate_parameters: XXXXX # bool
26 independant_parameters: XXXXX # bool
27 use_model_approximation: XXXXX # bool
28 run_reference_forward: XXXXX # bool
29 dump_sparse_arrays: XXXXX # bool
30 dump_obsvect_decompostion: XXXXX # bool
See also