pycif.plugins.obsvects.standard — API reference

pycif.plugins.obsvects.standard — API reference#

Configuration reference: standard plugin

pycif.plugins.obsvects.standard.build_full_r.build_r(obsvect, **kwargs)[source]#

Build the full observation error covariance matrix \(\mathbf{R}\).

Constructs the square matrix \(\mathbf{R} \in \mathbb{R}^{m \times m}\) by filling diagonal blocks for each observation tracer with the squared per-observation errors:

\[R_{ii} = \sigma_{\varepsilon,i}^2 \quad \forall\, i \in [\text{ypointer},\, \text{ypointer} + \text{dim})\]

Off-diagonal elements remain zero (diagonal covariance assumption).

Warning

This returns a dense \(m \times m\) matrix. For large observation vectors (e.g. \(m > 10^4\)) this will be memory-prohibitive. Use rinvprod() for the efficient diagonal application.

Parameters:
  • obsvect (Plugin) – obsvect plugin instance (provides yobs_err, dim, and access to the datavect tracer metadata).

  • **kwargs – unused; accepted for interface consistency.

Returns:

diagonal matrix of shape (dim, dim) containing the squared observation errors on the main diagonal.

Return type:

np.ndarray

pycif.plugins.obsvects.standard.fetch.default_fetch(ref_dir, ref_file, input_dates, target_dir, tracer=None, **kwargs)[source]#

Resolve observation monitor files and symlink them to the run directory.

For each sub-simulation date in input_dates, expands ref_dir and ref_file using strftime formatting, creates a symlink from the source file to target_dir, and returns de-duplicated sorted lists of local file paths and their associated dates.

Parameters:
  • ref_dir (str) – directory template for the source files; may contain strftime format codes (e.g. /data/%Y/%m).

  • ref_file (str) – file name template; may contain strftime codes (e.g. monitor_%Y%m%d.nc).

  • input_dates (dict) – mapping from sub-simulation start dates to lists of dates for which files should be fetched.

  • target_dir (str) – local run directory into which symbolic links are written; the base name of each source file is preserved.

  • tracer – unused; accepted for interface consistency with other fetch functions.

  • **kwargs – unused; accepted for interface consistency.

Returns:

a pair (list_files, list_dates) where both are dicts keyed by the same sub-simulation start dates as input_dates. Each value is a sorted, de-duplicated list of local file paths or resolved datetime objects respectively.

Return type:

tuple[dict, dict]

pycif.plugins.obsvects.standard.ini_mapper.ini_mapper(obsvect, general_mapper={}, backup_comps={}, transforms_order=[], ref_transform='', **kwargs)[source]#

Build the transform mapper for the observation vector.

Scans the datavect components and collects every (component, tracer) pair whose isobs flag is True. These pairs become the output tracer IDs of the observation operator — i.e. the quantities that the toobsvect system transform will write into obsvect.ysim.

Parameters:
  • obsvect (Plugin) – obsvect plugin instance (carries the populated datavect with component/tracer metadata).

  • general_mapper (dict) – mapper dictionaries from other transforms; unused but accepted for interface consistency.

  • backup_comps (dict) – unused; accepted for interface consistency.

  • transforms_order (list) – unused; accepted for interface consistency.

  • ref_transform (str) – unused; accepted for interface consistency.

  • **kwargs – unused.

Returns:

mapper dict with "inputs": {} (observation vector has no inputs from other transforms) and "outputs": {(comp, trcr): …} for every observation tracer.

Return type:

dict

pycif.plugins.obsvects.standard.init_rinvprod.init_rinvprod(obsvect, measurements, **kwargs)[source]#

Sanitise observation errors before the \(\mathbf{R}^{-1}\) product.

Replaces any non-positive (zero or negative) obserror values in the datastore with the dataset mean, preventing rinvprod() from encountering divisions by zero.

Note

Transport error inflation is mentioned in comments as a future extension; at present only the zero-error replacement is implemented.

Parameters:
  • obsvect (Plugin) – obsvect plugin instance (carries datastore with an obserror column).

  • measurements (Plugin) – unused; kept for API compatibility with the pyCIF plugin interface.

  • **kwargs – unused; accepted for interface consistency.

pycif.plugins.obsvects.standard.init_y0.init_y0(obsvect, **kwargs)[source]#

Initialise the flat observation-vector arrays from the datavect configuration.

Iterates over every component / tracer pair declared in obsvect.datavect.components and, for each one that is flagged as an observation (tracer.isobs):

  1. Loads the datastore from a pre-computed monitor.nc file (dir_obsvect is set) or reads it from the raw monitor files via init_param().

  2. Assigns a contiguous slice in the global yobs / yobs_err / ysim / dy arrays using tracer.ypointer.

  3. Appends the obsvect_mask boolean array that marks which observations enter the cost function (is_obsvect == True).

  4. Optionally applies per-tracer error scaling (obserror_scale and obserror_value).

  5. Compresses the tracer datastore to save memory.

  6. Dumps the final observation vector when obsvect.dump_obs is set.

Parameters:
  • obsvect (Plugin) – obsvect plugin instance. On entry its flat arrays (yobs, ysim, etc.) are empty; on exit they are filled.

  • **kwargs – forwarded to init_param() and downstream datastream read / fetch methods.

Returns:

the updated obsvect instance (also modified in-place).

Return type:

Plugin

pycif.plugins.obsvects.standard.rinvprod.rinvprod(obsvect, dy: ndarray[tuple[Any, ...], dtype[floating]], inverse: bool = True, mask: ndarray[tuple[Any, ...], dtype[bool]] | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]#

Apply the observation error covariance (or its inverse) to a vector.

Assumes a diagonal observation error covariance matrix \(\mathbf{R} = \mathrm{diag}(\sigma_{\varepsilon,1}^2, \ldots, \sigma_{\varepsilon,m}^2)\).

Two modes of operation:

  • inverse=True (default) — computes \(\mathbf{R}^{-1}\,\delta\mathbf{y}\):

    \[(\mathbf{R}^{-1}\,\delta\mathbf{y})_i = \frac{\delta y_i}{\sigma_{\varepsilon,i}^2}\]

    Used in the cost function gradient: \(\nabla J_o = \mathbf{H}^\top \mathbf{R}^{-1} (\mathcal{H}(\mathbf{x}) - \mathbf{y}^o)\).

  • inverse=False — computes a noise-perturbed observation sample \(\mathbf{y}^o + \mathbf{R}^{1/2}\,\delta\mathbf{y}\):

    \[(\mathbf{R}^{1/2}\,\delta\mathbf{y} + \mathbf{y}^o)_i = \sigma_{\varepsilon,i}\,\delta y_i + y^o_i\]

    Used for Monte Carlo sampling of perturbed observations.

An optional mask restricts the operation to the active observation subset (obsvect.obsvect_mask); masked-out positions are set to zero in the output.

Parameters:
  • obsvect (Plugin) – obsvect plugin instance (provides yobs_err and yobs).

  • dy (np.ndarray) – input vector, shape (dim,).

  • inverse (bool) – if True apply \(\mathbf{R}^{-1}\); if False apply \(\mathbf{R}^{1/2}\) and add yobs.

  • mask (np.ndarray of bool, optional) – boolean mask of shape (dim,) selecting a subset of observations. Positions where the mask is False are set to zero in the output.

Returns:

result vector, shape (dim,).

Return type:

np.ndarray

Raises:

ValueError – if mask has a different shape than dy, or if any entry of yobs_err is zero or NaN (which would make \(\mathbf{R}^{-1}\) undefined or infinite).