Main CIF observation operator standard/std#
Description#
This is the main observation operator for pyCIF. It is called by most execution modes and heavily relies on so-called transforms for elementary operations.
Indeed, the observation operator can be decomposed as follows in sub-operations:
See details about the transforms here, in particular their individual documentation and the general input output format.
Transform pipeline#
In pyCIF, the successive transforms are arranged into a so-called pipeline.
The steps to initialize a pipeline consistent with the user-defined configuration
are carried out in the function:
- pycif.plugins.obsoperators.standard.transforms.init_transform(self)[source]
 Initialize the transform pipeline according to user choices. This includes the explicit definition of sub-pipelines (main, control vector side and observation vector side), definition based on aliases in the
datavect, and transforms automatically added depending on compatibility of successive input/output formats (including domain definition, dates and units).
Note
To compute a given pipeline, the observation operator first walks the pipeline backwards in a dry-run mode. This initialization step allows propagating metadata about what output format is needed for transformations.
For instance, metadata about observations need to be propagated backwards, so pyCIF knows where to extract concentrations in the CTM, before running it forward.
Main pipeline#
The observation vector builds the transformation pipeline according to information specified in the control vector transform_pipe, in the observation vector transform_pipe and in the observation operator transform_pipe
The functions used to determine the main pipe are the following (by order of execution):
- pycif.plugins.obsoperators.standard.transforms.init_mainpipe(self, all_transforms, backup_comps, mapper)[source]
 Initialize the core of the transform_pipe depending on the list of transformations specified in obsoper.transform_pipe.
Warning
If no transform_pipe is specified, the CTM model specified in the Yaml is run by default.
On the opposite, if
transform_pipeis specified in the observation operator, only transforms explicitly specified will be used. Thus, if custom transforms need to be run on top of the model, one should not forget to include the transformrun_modelin thetransform_pipe. Another option (recommended for most applications) is to use thecontrolvectandobsvecttransform_pipes to define transforms related to the control vector and to the observation vector respectively.
- pycif.plugins.obsoperators.standard.transforms.init_control_transformations(self, all_transforms, controlvect, backup_comps, mapper)[source]
 Initialize transforms on the control vector side.
Also loops on all components/tracers of the
datavectand for those for which the argumentunit_conversionis specified, applies the unit_conversion transform.
- pycif.plugins.obsoperators.standard.transforms.init_obsvect_transformations(self, all_transforms, obsvect, backup_comps, mapper)[source]
 Initialize transforms on the observation vector side.
Also, for the component
satelliteof thedatavect, includes the transform satellites to the pipeline.
Connecting and ordering transforms into a pipeline#
- pycif.plugins.obsoperators.standard.transforms.connect_pipes(all_transforms, mapper, transform)[source]
 Connect transforms based on their inputs and outputs
- pycif.plugins.obsoperators.standard.transforms.period_pipe(self, all_transforms, mapper)[source]
 Arrange all transformations for all their sub-simulation periods into a pipe whose order respects the required precursors and successors for each transformation.
First propagate sub-simulation periods to precursors/successors for transformations which don’t have pre-defined sub-simulation periods.
Second, define a graph from all the precursors of all transformations
Last, walk the graph to define the proper order of the transformations
- Parameters:
 all_transforms – the object gathering all transformations
mapper – the dictionary containing all information about the input/output of each transformation
- Returns:
 the pipes to be computed in forward and backward mode, including for each direction a dry run in the other direction for initialization
Automatic pipeline#
After initializing the main pipeline of required transforms, the observation
operator, checks the consistency of the horizontal and vertical extent, of the temporal
resolution, and of the data unit to determine extra intermediate transformations to be
carried out.
More precisely, for every successive transform of the main pipeline,
the observation operator checks whether the output format of the precursor transform
is consistent with the input format of the successor transform.
This check includes the definition of the domain (horizontal and vertical extent),
of the input_dates (temporal definition) and of the unit.
The corresponding transforms that may be included at this step are:
For each of the above-mentioned transforms, it is possible to explicitly specify extra
parameters in the related component/tracer of the datavect as follows:
datavect :
  components:
    flux:
      parameters:
        CO2:
          dir: XXX
          file: XXX
          regrid:
            method: mass-conservation
All these operations are done in the function:
YAML arguments#
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
Optional arguments#
- autorestart : bool, optional, default False
 if interrupted, computations restart from the last simulated period. WARNING: the CIF cannot detect whether this period has been correctly written or is corrupt: it is necessary to check manually in the relevant directories and remove the last simulated period if a file has not been correctly written.
- autoflush : bool, optional, default False
 Remove big temporary files when the run is done
- force-full-flush : bool, optional, default False
 Complementary to autoflush. Also flushes files needed to run an adjoint. Use this option when no adjoint is needed later. The option is triggered only if autoflush is True
- save_debug : bool, optional, default False
 Force transforms to save debugging information. Intermediate datastores will be saved in the directory $workdir/obsoperator/$run_id/transform_debug/
Warning
This option saves every intermediate states of the transformation pipeline. It slows drastically the computation of the obsvervation operator and can take a lot of disk space. Should be used only for debugging or understanding what happens along the way.
- force_full_operator : bool, optional, default False
 Force computing all transforms in the observation operator, event if no observation is to be simulated.
- init_inputs : optional
 Structure of components and parameters to initialize. Doing so, there is no need to define an execution mode. Only inputs that were required will be computed. Moreover, with this option, it is possible to provide a partial yaml paragraph for the
datavectobject: only components required to generate those required are checked before execution.- Argument structure:
 - any_key : optional
 Name of a given component to be initialized
- Argument structure:
 - parameters : list, optional
 List of parameters to initialize for the corresponding component. Initialize all parameters if not specified
- transform_pipe : optional
 List of transformations to build the main observation operator pipeline
- Argument structure:
 - any_key : optional
 Name of a given transformation to be included. The name has no impact on the way the observation operator is computed, although it is recommended to use explicit names to help debugging.
- Argument structure:
 - **args : optional
 Arguments to set-up the given transform
- parallel : optional
 Physical parallelization of the computation of the TL and adjoint
- Argument structure:
 - segments : str, mandatory
 Length of each parallel segment
- overlap : str, mandatory
 Length of the initial overlap with previous segments
- subprocess : bool, optional, default False
 If True submit the segments in subprocesses, else submit them in new jobs with the platform plugin
- nproc : int, optional
 number of proc to attribute to each segments when ‘subprocess’ is True (work with LMDz only)
- ref_fwd_dir : str, optional, default “”
 Path to a reference forward run. This is used when using the approximate operator to accelerate its computation.
- approx_operator : optional
 Approximate the observation operator outside the given interval
- Argument structure:
 - datei : str, mandatory
 Start date of the interval on which to compute the real operator
- datef : str, mandatory
 Start date of the interval on which to compute the real operator
- batch_computation : optional
 Compute perturbed samples of the control vector within the same observation operator
- Argument structure:
 - nsamples : int, mandatory
 Number of samples to generate
- dir_samples : str, mandatory
 Directory where to fetch sample control vectors
- file_samples : str, optional, default “controlvect_ensemble.pickle”
 Sample control vectors file name
- dont_propagate : list, optional
 list of (component, parameter) tuples that should not be propagated
- dont_propagate_obsvect : list, optional
 list of (component, parameter) tuples that ‘toobsvect’ transformation should not be propagated
- ignore_model : bool, optional, default False
 Do not run the model as part of the observation operator.
- force_propagate_attributes : bool, optional, default False
 Force the propagation of attributes throughout transforms. Use with caution.
- monitor_memory : bool, optional, default False
 Print memory usage for each transform.
- clean_memory : bool, optional, default True
 Clean datastores that are not used anymore
- autokill_time : str, optional
 Stops the running simulation after a given time and re-submit it automatically in a new job. Should be one of Pandas’ offset aliases for example use ‘23h’ to stop the simulation after 23 hours. When using this option, a platform plugin with the options needed for submitting a job is required.
- max_resubmissions : int, optional, default 0
 Maximum number of times the simulation can be automatically re-submitted in a job.
Requirements#
The current plugin requires the present plugins to run properly:
Requirement name  | 
Requirement type  | 
Explicit definition  | 
Any valid  | 
Default name  | 
Default version  | 
|---|---|---|---|---|---|
model  | 
False  | 
True  | 
None  | 
None  | 
|
obsvect  | 
True  | 
True  | 
standard  | 
std  | 
|
controlvect  | 
True  | 
True  | 
standard  | 
std  | 
|
datavect  | 
True  | 
True  | 
standard  | 
std  | 
|
platform  | 
True  | 
True  | 
None  | 
None  | 
YAML template#
Please find below a template for a YAML configuration:
 1obsoperator:
 2  plugin:
 3    name: standard
 4    version: std
 5    type: obsoperator
 6
 7  # Optional arguments
 8  autorestart: XXXXX  # bool
 9  autoflush: XXXXX  # bool
10  force-full-flush: XXXXX  # bool
11  save_debug: XXXXX  # bool
12  force_full_operator: XXXXX  # bool
13  init_inputs:
14    any_key:
15      parameters: XXXXX  # list
16  transform_pipe:
17    any_key:
18      **args: XXXXX  # any
19  parallel:
20    segments: XXXXX  # str
21    overlap: XXXXX  # str
22    subprocess: XXXXX  # bool
23    nproc: XXXXX  # int
24  ref_fwd_dir: XXXXX  # str
25  approx_operator:
26    datei: XXXXX  # str
27    datef: XXXXX  # str
28  batch_computation:
29    nsamples: XXXXX  # int
30    dir_samples: XXXXX  # str
31    file_samples: XXXXX  # str
32    dont_propagate: XXXXX  # list
33    dont_propagate_obsvect: XXXXX  # list
34  ignore_model: XXXXX  # bool
35  force_propagate_attributes: XXXXX  # bool
36  monitor_memory: XXXXX  # bool
37  clean_memory: XXXXX  # bool
38  autokill_time: XXXXX  # str
39  max_resubmissions: XXXXX  # int