pycif.plugins.obsoperators.standard — API reference

pycif.plugins.obsoperators.standard — API reference#

Configuration reference: standard plugin

pycif.plugins.obsoperators.standard.check.check_inputs(inputs, mode)[source]#

Check the consistency of inputs given to the observation operator.

Validates that mode is one of the accepted values and that inputs carries the attributes required by that mode.

Parameters:
  • inputs – control or observation vector passed to the operator; must expose at least x for 'fwd' mode, and both x and dx for 'tl' mode.

  • mode (str) – requested execution mode — one of 'fwd', 'tl', or 'adj'.

Returns:

True if all checks pass.

Return type:

bool

Raises:
  • Exception – if mode is not one of 'fwd', 'tl', or 'adj'.

  • Exception – if mode is 'tl' and inputs does not expose both x and dx.

pycif.plugins.obsoperators.standard.flushrun.flushrun(self, workdir, rundir, mode, transform_pipe, full_flush=True)[source]#

Remove intermediate files produced by transforms that are no longer needed.

Iterates over every transform in transform_pipe and calls each transform’s own flushrun method to clean up its output files in rundir. In adjoint mode, when the operator is not running in approximate mode, the forward reference directory of each transform is also flushed — provided it lies inside workdir, to avoid accidentally deleting files outside the managed tree.

Parameters:
  • self (ObsOperator) – the obs-operator plugin instance.

  • workdir (str) – root working directory; used to check that transf.adj_refdir is a safe path to flush.

  • rundir (str) – the run sub-directory whose files should be cleaned.

  • mode (str) – execution mode — one of 'fwd', 'tl', or 'adj'; controls whether adj_refdir of each transform is also flushed.

  • transform_pipe – the Transform object holding all transforms for this run.

  • full_flush (bool, optional) – forwarded to each transform’s own flushrun; if False only a partial cleanup is performed (exact behaviour is transform-specific). Defaults to True.

Raises:

PluginError – caught internally and logged as a warning if a transform’s flushrun raises it; execution continues with the remaining transforms.

pycif.plugins.obsoperators.standard.obsoper.obsoper(self, controlvect, obsvect, mode, run_id=0, datei=datetime.datetime(1979, 1, 1, 0, 0), datef=datetime.datetime(2100, 1, 1, 0, 0), workdir='./', reload_results=False, check_transforms=False, ignore_exceptions=False, force_fetch_results=False, **kwargs)[source]#

Run the standard observation operator in forward, tangent-linear or adjoint mode.

Orchestrates the full observation-operator pipeline:

  • Creates a per-run sub-directory obsoperator/<mode>_<run_id>/ under workdir.

  • If reload_results is set, attempts to recover cached outputs from a previous run before computing from scratch.

  • Dispatches to obsoper_serial() or obsoper_parallel() depending on whether self.parallel is configured.

  • Dumps the resulting observation or control vector to disk for later use.

Parameters:
  • self (ObsOperator) – the obs-operator plugin instance.

  • controlvect (ControlVect) – control-vector object. Must carry x (and dx for 'tl' mode); receives dx in 'adj' mode.

  • obsvect (ObsVect) – observation-vector object. Receives ysim (and dy for 'tl' mode); provides dy in 'adj' mode.

  • mode (str) – execution mode — one of 'fwd', 'tl', or 'adj'.

  • run_id (int | str, optional) – identifier for the current run; used to name the sub-directory. Defaults to 0.

  • datei (datetime.datetime, optional) – start date of the simulation window. Defaults to datetime.datetime(1979, 1, 1).

  • datef (datetime.datetime, optional) – end date of the simulation window. Defaults to datetime.datetime(2100, 1, 1).

  • workdir (str, optional) – parent directory in which the run sub-directory is created. Defaults to "./"

  • reload_results (bool, optional) – if True, attempt to recover pre-computed outputs from the run sub-directory before running the full pipeline. Defaults to False.

  • check_transforms (bool, optional) – if True, run each transform in both directions and verify the adjoint / TL identity; disables result reloading. Defaults to False.

  • ignore_exceptions (bool, optional) – if True, non-fatal transform errors are logged and swallowed rather than re-raised. Defaults to False.

  • force_fetch_results (bool, optional) – if True and cached outputs cannot be found, raise IOError instead of computing. Defaults to False.

  • **kwargs – extra keyword arguments (ignored).

Returns:

in 'fwd' and 'tl' modes — the updated obsvect with ysim (and dy) populated.

ControlVect: in 'adj' mode — the updated controlvect with dx populated.

Return type:

ObsVect

Raises:
  • TypeError – if run_id is neither an int nor a str.

  • IOError – if force_fetch_results is True and cached outputs cannot be loaded.

pycif.plugins.obsoperators.standard.parallel.run_pycif_in_subprocess(python_path, yaml_path)[source]#

Run a pyCIF configuration file in a blocking subprocess.

Launches python_path -m pycif yaml_path, redirecting stdout to subprocess_stdout.log and stderr to subprocess_stderr.log in the same directory as yaml_path.

Parameters:
  • python_path (str) – path to the Python interpreter (e.g. self.platform.python).

  • yaml_path (str) – absolute path to the pyCIF YAML configuration file to execute.

Raises:

RuntimeError – if the subprocess exits with a non-zero return code.

pycif.plugins.obsoperators.standard.parallel.obsoper_parallel(self, controlvect, obsvect, rundir, mode, workdir, check_transforms, ignore_exceptions)[source]#

Run the observation operator in parallel over independent time segments.

Splits the simulation window [self.datei, self.datef] into segments of length self.parallel.segments with optional boundary overlap self.parallel.overlap, then runs each segment independently — either as subprocesses (self.parallel.subprocess = True) or as HPC jobs via the platform plugin.

Each segment is configured via a freshly dumped YAML file that restricts the approx_operator window to its date range, then executed with run_pycif_in_subprocess() or self.platform.submit_job.

After all segments finish, their outputs are reassembled:

  • 'tl' mode — obsvect.ysim and obsvect.dy are set to the element-wise sums over all segment observation vectors.

  • 'adj' mode — controlvect.dx is set to the element-wise sum over all segment adjoint sensitivities; controlvect.x and controlvect.xb are reset to their pre-run values.

Parameters:
  • self (ObsOperator) – the obs-operator plugin instance. Must have self.parallel (with segments, overlap, subprocess attributes), self.datei, self.datef, self.ref_fwd_dir, and self.platform set.

  • controlvect (ControlVect) – control-vector object.

  • obsvect (ObsVect) – observation-vector object.

  • rundir (str) – the run sub-directory for this operator call.

  • mode (str) – execution mode — 'tl' or 'adj'. (Forward mode is always dispatched to serial execution.)

  • workdir (str) – parent working directory.

  • check_transforms (bool) – if True, validate each segment’s transform adjoint / TL identity.

  • ignore_exceptions (bool) – if True, non-fatal transform errors inside segments are swallowed.

Raises:

RuntimeError – if a subprocess-based segment exits with a non-zero return code (propagated from run_pycif_in_subprocess()).

pycif.plugins.obsoperators.standard.serial.obsoper_serial(self, controlvect, obsvect, rundir, mode, workdir, check_transforms, ignore_exceptions)[source]#

Run the observation operator sequentially over all transforms and time steps.

Handles bookkeeping common to every serial execution:

  • 'fwd' / 'tl' — zeros obsvect.ysim and obsvect.dy, then dumps the control vector to rundir/controlvect.pickle.

  • 'adj' — initialises controlvect.dx = 0 and enables forward-run chaining for multi-step models.

Dispatches to the Dask execution path (init_dask()) when self.use_dask is set, otherwise runs the standard transform loop via do_transforms().

After the run, calls flushrun() to clean up intermediate files when self.autoflush is set (and the operator is not running in parallel mode).

Stores rundir as self.ref_fwd_dir after a forward run so that the subsequent adjoint can locate the forward outputs.

Parameters:
  • self (ObsOperator) – the obs-operator plugin instance.

  • controlvect (ControlVect) – control-vector object.

  • obsvect (ObsVect) – observation-vector object.

  • rundir (str) – the run sub-directory for this operator call.

  • mode (str) – execution mode — one of 'fwd', 'tl', or 'adj'.

  • workdir (str) – parent working directory; forwarded to flushrun().

  • check_transforms (bool) – if True, validate each transform’s adjoint / TL identity.

  • ignore_exceptions (bool) – if True, non-fatal transform errors are swallowed rather than re-raised.