Response functions response-functions/std#

Description#

Tutorial: How to run response functions

Computes response functions based on a given observation operator, control vector and observation vector.

It explicitly computes the observation operator \(\mathcal{H}(\mathbf{x})\), which is assumed to be linear by running so-called base functions or response functions.

To do so, it computes \(\mathbf{y}_i = \mathcal{H}(\mathbf{x}_i)\) , \(\forall\, 1 \leq i \leq \mathrm{dim}(\mathbf{x})\), where \(\mathbf{x}_i\) is the control vector with nulled values, except the \(i^\mathrm{th}\) element.

Response functions functions are computed as individual pyCIF simulations stored in $workdir/base_functions/

Note

The pyCIF process can be restarted if it stops because one or multiple response functions crash or do not produce the desired output. It will run again the response functions taht did not produce the desired output.

See the obsoperator plugin autorestart input argument for futher details about restarting pyCIF simulations.

Warning

As one simulation per dimension of the control vector is needed for this mode, please first check the dimension of your control vector and the time required for each simulation. You can check this by using the dryrun argument (see below)

Outputs#

Observation vector#

The full observation vector is obtained with \(\mathbf{y} = \sum_i \mathbf{y}_i\) and is stored in $workdir/obsvect/

The observation vector dump column corresponding to the run_mode argument is filled with \(\mathbf{y}\).

If the run_mode argument is set 'tl' (default) and a reference forward simulation is ran, the observation vector dump 'sim' column is filled with the observation vector from te reference forward simulation and the 'sim_tl' column is filled with \(\mathbf{y}\).

\(\mathbf{H}\) matrix#

The \(\mathbf{H}\) matrix is obtained with \(\mathbf{H} = \left(\mathbf{y}_1^\mathrm{T}, \, \dots, \, \mathbf{y}_N^\mathrm{T} \right)\) and is stored in $workdir/h_matrix.nc

The \(\mathbf{H}\) matrix decomposition per control vector parameter is obtained by picking the \(\mathbf{H}\) matrix lines corresponding to each parameter and reshaping the resulting sub-matrix with the parameter dimensions. The decompositions are stored in $workdir/base_functions/decomposition/

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

dryrun : bool, optional, default False

Create all response functions input files then stop. This option can be used to know the number of response functions

run_mode : “fwd” or “tl”, optional, default “tl”

Run mode of the response functions, if "tl" (tangent linear) is chosen and use_model_approximation is set to true, a forward reference will be runned

autoflush : bool, optional, default False

Flush temporary files that are not already flush by the model plugin flushrun method.

reload_results : bool, optional, default True

Reload response functions results from previous simulations. If set to true already computed simulations will not be run. Affect both the eventual reference forward simulation and the response functions simulations.

reload_h_matrix : “str or list of str”, optional

Reload the H matrix from previous simulations. If this argument is used, the computation of the response functions will be skipped and the H matrix will be read from the provided path(s). If multiple paths are provided, the H matrices will be summed.

clamp_h_matrix_to_zero : bool, optional, default True

Ensure all H matrix elements are greater than zero by clmamping them to zero.

analytical_inversion : bool, optional, default False

Do an analytical inversion with the H matrix build with the response function results

use_woodbury_identity : “bool or ‘auto’”, optional, default “auto”

Use Woodbury matrix identity to compute the inverse of \(\left( \mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^T \right)\). Decreases computation time significantly when \(\mathrm{dim}(\mathbf{R}) \gg \mathrm{dim}(\mathbf{B})\) (significantly increase the computation time otherwise). When this option is set to “auto” the method used is chosen according to \(\mathbf{R}\) and \(\mathbf{B}\) dimensions

full_period : bool, optional, default False

Run the response functions over the whole simulation windows. This argument cannot be set to true if the ‘spin_down’ arguments is used.

spin_down : str, optional

Spin-down of period of the response functions. Should be a valid pandas period alias (1D, 1M, …). Spin-down value can be set here globally for all response fonction or individual by setting a control vector option individially for tracers in the dat vector (the later option is prioritized when both are used).

first_period_only : bool, optional, default False

Only run the response funxtions that correspond to the first time period in the control vector. This option can be used to get the computing time required for running the response functions over one period or get the ‘relaxation’ time of one period. This option can not to be used with full_period = true

inicond_component : str, optional, default “inicond”

Initial conditions datavect component name

ignore_tracers : list, optional

List of datavect (component, parameter) couples to ignore. Ignored parameters corresponding response function will not be run and its outputs will be filled with zeros

job_batch_size : int, optional, default 20

Size of job batches to submit, wait for one batch to finish to submit the next one. If this option is set to zero or a negative number, all jobs will be submited at the same time (not recommended). When running jobs in a subprocess, setting this option to 1 will triger the cleaning of the temporary files after every job.

pseudo_parallel_job : bool, optional, default False

Run the job batches (of size job_batch_size) in “pseudo parallel mode, i.e. with a job file of the following format:

python -m pycif config_a.yaml &
python -m pycif config_b.yaml &
python -m pycif config_c.yaml &
wait
use_batch_sampling : bool, optional, default False

Group response functions per time periods and run them with the observation operator ‘batch_computation’ mode

batch_sampling_size : int, optional

Maximum size for the batch sampling batches

separate_parameters : bool, optional, default False

Separate response functions by observation parameters. This option can only be used with use_batch_sampling = True

independant_parameters : bool, optional, default False

If true parameters (species) are considered as “independant”, i.e. one response function will only affect the parameter of its control vector tracer and/or the parameters resulting of the control vector transformations (transform_pipe) taking the response function control vector tracer as input. This option can help reduce the number of samples actually present in batch sampling response function simulations. This option can only be used with use_batch_sampling = True

use_model_approximation : bool, optional, default False

Use the approximation of the model tangent operator for response functions. if use_model_approximation is set to true, run_mode must be set to "tl"

run_reference_forward : bool, optional, default False

Run a reference forward run and fill the observation vector y ('sim') field with the results. This option can not to be used with run_mode = 'fwd'

dump_sparse_arrays : bool, optional, default False

Use COOrdinates sparse arrays in dumped NetCDF files. The pycif.utils.sparse_array.to_dense_dataset function can be used to convert the sparse NetCDF files data to dense arrays

dump_obsvect_decompostion : bool, optional, default False

Dump observation vector decomposition by response function

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name

Requirement type

Explicit definition

Any valid

Default name

Default version

platform

Platform

True

True

None

None

model

Model

False

True

None

None

obsoperator

ObsOperator

True

True

standard

std

obsvect

ObsVect

False

True

standard

std

controlvect

ControlVect

True

True

standard

std

datavect

DataVect

True

True

standard

std

YAML template#

Please find below a template for a YAML configuration:

 1mode:
 2  plugin:
 3    name: response-functions
 4    version: std
 5    type: mode
 6
 7    # Optional arguments
 8    dryrun: XXXXX  # bool
 9    run_mode: XXXXX  # fwd|tl
10    autoflush: XXXXX  # bool
11    reload_results: XXXXX  # bool
12    reload_h_matrix: XXXXX  # str or list of str
13    clamp_h_matrix_to_zero: XXXXX  # bool
14    analytical_inversion: XXXXX  # bool
15    use_woodbury_identity: XXXXX  # bool or 'auto'
16    full_period: XXXXX  # bool
17    spin_down: XXXXX  # str
18    first_period_only: XXXXX  # bool
19    inicond_component: XXXXX  # str
20    ignore_tracers: XXXXX  # list
21    job_batch_size: XXXXX  # int
22    pseudo_parallel_job: XXXXX  # bool
23    use_batch_sampling: XXXXX  # bool
24    batch_sampling_size: XXXXX  # int
25    separate_parameters: XXXXX  # bool
26    independant_parameters: XXXXX  # bool
27    use_model_approximation: XXXXX  # bool
28    run_reference_forward: XXXXX  # bool
29    dump_sparse_arrays: XXXXX  # bool
30    dump_obsvect_decompostion: XXXXX  # bool