Analytical inversions analytic/std#

Description#

Direct (analytical) Bayesian inversion via explicit H-matrix construction.

Mathematical framework#

This mode performs a Best Linear Unbiased Estimator (BLUE) inversion under the Gaussian error assumption. It first assembles the observation operator matrix \(\mathbf{H} \in \mathbb{R}^{m \times n}\) explicitly, column by column, then solves for the posterior state analytically.

Step 1 — Building H#

The Jacobian matrix is constructed by running one forward simulation per control-vector dimension \(i\):

\[\mathbf{H}_{:,\,i} = \mathcal{H}(\mathbf{e}_i)\]

where \(\mathbf{e}_i\) is the \(i\)-th canonical basis vector (all zeros except element \(i\) set to 1). This exploits linearity of the operator:

\[\mathbf{H}\,\mathbf{x} = \sum_i x_i\,\mathcal{H}(\mathbf{e}_i)\]

Each simulation is submitted as an independent pyCIF forward run stored under $workdir/base_functions/.

Step 2 — Analytical inversion (BLUE)#

Given the prior \(\mathbf{x}_b\) with background error covariance \(\mathbf{B} \in \mathbb{R}^{n \times n}\), observations \(\mathbf{y}\) with observation error covariance \(\mathbf{R} \in \mathbb{R}^{m \times m}\), the posterior (analysis) state is:

\[\mathbf{x}_a = \mathbf{x}_b + \mathbf{K}\bigl(\mathbf{y} - \mathbf{H}\,\mathbf{x}_b\bigr)\]

where the Kalman gain is

\[\mathbf{K} = \mathbf{B}\mathbf{H}^\top \bigl(\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top\bigr)^{-1}\]

and the posterior error covariance is

\[\mathbf{P}_a = \mathbf{B} - \mathbf{K}\mathbf{H}\mathbf{B}\]

Complexity and scalability#

The dominant cost is the \(n\) forward simulations required to build \(\mathbf{H}\). The matrix inversion \((\mathbf{R} + \mathbf{H}\mathbf{B}\mathbf{H}^\top)^{-1}\) is \(\mathcal{O}(m^3)\) in the observation dimension — feasible for moderate \(m\) but prohibitive for large observing systems. For large problems use the variational (4dvar) or response-functions modes instead.

Warning

One forward simulation per control-vector dimension \(n\) is required. Check \(n\) and the cost of a single forward run before launching. Use the dryrun option to estimate the total wall-clock time without committing to the full computation.

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

dump_nc_base_control : bool, optional, default False

Save each Dirac control vector (base function input) as NetCDF for post-hoc inspection of what was actually run.

dryrun : bool, optional, default False

Submit only the first base function to estimate the per-run cost, then stop without completing the full H matrix.

sequential : bool, optional, default False

Wait for each job to finish before submitting the next. Useful when concurrent submissions are restricted (e.g. GPU queues).

resp_func_only : bool, optional, default False

Does not run the inversion, only the response functions to build the H matrix

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name

Requirement type

Explicit definition

Any valid

Default name

Default version

obsvect

ObsVect

False

True

standard

std

controlvect

ControlVect

True

True

standard

std

obsoperator

ObsOperator

True

True

standard

std

platform

Platform

True

True

None

None

YAML template#

Please find below a template for a YAML configuration:

 1mode:
 2  plugin:
 3    name: analytic
 4    version: std
 5    type: mode
 6
 7  # Optional arguments
 8  dump_nc_base_control: XXXXX  # bool
 9  dryrun: XXXXX  # bool
10  sequential: XXXXX  # bool
11  resp_func_only: XXXXX  # bool