Models model#

Available Models model#

The following models are implemented in pyCIF so far:

Documentation#

Description#

The model class runs chemistry-transport models, process their outputs and generates their inputs. Please note that models are often computed with high-performance languages such as Fortran or C. In these case, the sources are included in the directory model_sources provided alongside pyCIF.

Required parameters, dependencies and functions#

The following attributes, dependencies and functions should be defined for any model, as they are called by other plugins. They can be parameters to define at the set-up step, functions to implement in the corresponding module, or dependencies to be attached to the model class.

Parameters and attributes#

Initialization parameters#

The following attributes are defined once for all at the initialization of the model, they inform pyCIF about the temporal resolution of the model. All the following objects are filled with datetime.datetime objects. To make the handling of lists easier, pyCIF requires lists to be implemented as numpy.array

Below, only subsimu_dates is mandatory, the others are recommended to be called elsewhere, in particular for the definition of the mapper.

subsimu_dates:

the list of simulation periods if the model simulation window is split into shorter sub-periods

tstep_dates:

the time-steps at which the model carries out its numerical computations; this argument is used by pyCIF to determine which observation to compare to what model time step. The shape of this argument is a dictionary, whose keys are subsimu_dates and entries are the lists of time-steps corresponding to each sub-period.

tstep_all:

the same as tstep_dates; the difference is that tstep_all is a list containing all time steps of all simulation sub-periods instead of a dictionary split into sub-periods

input_dates:

dates at which the model expects some inputs; has the same shape as tstep_dates

Please find below an illustration of the different time steps:

digraph { tbl [ shape=plaintext label=< <table border='0' cellborder='1' color='blue' cellspacing='0' width="500"> <tr><td></td><td>1st sub-period</td><td>2nd sub-period</td></tr> <tr> <td>Global time scale</td> <td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td><td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">7 </td><td width="30">8 </td><td width="30">9</td><td width="30">10 </td><td width="30">11 </td><td width="30">12</td></tr> </table> </td> </tr> <tr> <td>Local time scale</td> <td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td><td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td> </tr> <tr> <td>Observation 1</td> <td colspan="2" style="padding: 40px 10px 5px 5px;"> | sampling period | </td> </tr> <tr> <td>Observation 2</td> <td colspan="2" style="padding: 40px 10px 5px 5px;"> | sampling period | </td> </tr> </table> >]; }

In the example, the model is run between January 1st, 2010 to February 28th, 2010. Computations are carried out every hours and inputs are expected every 3 hours. In that case, the temporal variables are:

import numpy as np

subsimu_dates = np.array([datetime.datetime(2010, 1, 1), datetime.datetime(2010, 2, 1)])

tstep_dates = {
    datetime.datetime(2010, 1, 1): np.array(
        [datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 1),
         ..., datetime.datetime(2010, 1, 31, 23)]),
    datetime.datetime(2010, 2, 1): np.array(
        [datetime.datetime(2010, 2, 1, 0), datetime.datetime(2010, 2, 1, 1),
         ..., datetime.datetime(2010, 2, 28, 23)]),
}

tstep_all = np.array([
    datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 1),
    ..., datetime.datetime(2010, 2, 28, 23)
])

input_dates = {
    datetime.datetime(2010, 1, 1): np.array(
        [datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 3),
         ..., datetime.datetime(2010, 1, 31, 21)]),
    datetime.datetime(2010, 2, 1): np.array(
        [datetime.datetime(2010, 2, 1, 0), datetime.datetime(2010, 2, 1, 3),
         ..., datetime.datetime(2010, 2, 28, 21)]),
}
Online parameters#

The following variables are defined online during the computation of the model.

chain:

for a given model simulation, files from previous sub-periods necessary to run following sub-periods are stored in current_sim_directory/chain; the chain variable stores the date of the previous sub-period that was computed; the variable is automatically updated by the obsoperator, but the files should be moved by the function run of the model.

adj_refdir:

this is the directory where forward simulations corresponding to the adjoint being run are stored; the variable should be updated when running a forward in the run function.

Dependencies#

Some other classes in pyCIF expect the model class to have a domain class attached to it, describing the model domain. This way, model.domain can be called.

Functions#

The following functions need to be implemented in any model to make it interact with other classes. They must be imported at the root level of the corresponding python package, i.e. in the __init__.py file:

from XXXXX import ini_periods
from XXXXX import run
from XXXXX import make_auxiliary
from XXXXX import native2inputs
from XXXXX import native2inputs_adj
from XXXXX import outputs2native
from XXXXX import outputs2native_adj
from XXXXX import compile
from XXXXX import ini_mapper

It is recommended to include each function in a separate file to avoid very long scripts.

ini_periods (optional)#
pycif.plugins.models.template.ini_periods(self, **kwargs)[source]

The function ini_periods is optional but very recommended. It is used to define the temporal variables subsimu_dates, input_dates, tstep_dates and tstep_all. The function is automatically called at the initialization of the model class if available. If not available, the temporal variables should be defined manually in the ini_data function (not recommended).

ini_periods is a class method that applies to the model plugin itself. Therefore, the only expected argument is self.

def ini_periods(self, **kwargs):

    self.subsimu_dates = XXXX
    self.tstep_dates = XXXXX
    self.input_dates = XXXXX
    self.tstep_all = XXXXX

Click below to see an example of the ini_periods function for the model CHIMERE.

pycif.plugins.models.chimere.ini_periods(self, **kwargs)[source]
run#

The function run executes the model itself. As models are often computationally expensive to run, they are not written in python. Therefore, the run function calls an external executable compiled previously.

There are several ways to call system executables in python. We recommend using the function subprocess.Popen for that purpose. It gives flexibility in logging and can capture errors during the execution of the external executable.

Other tasks carried out by the run function are:

  • update the variable self.adj_refdir for later adjoint simulations

  • update the variable self.chain for later sub-periods and move necessary files to that directory;

    these files include for instance concentration fields at the last time step of the period, to be used as initial conditions for the next period.

pycif.plugins.models.template.run(self, runsubdir, mode, workdir, ddi, nbproc=1, do_simu=True, approx_transf=False, ref_fwd_dir='', overlap=False, **kwargs)[source]

Run the model in forward, tangent-linear or adjoint mode. This includes:

  • executing the model external executable

  • updating adj_refdir

  • moving files needed for chained simulations to “{}/../”.format(runsubdir)

Note:

For model for which the adjoint is not coded, make sure to return a clear error if the run function is called in adj mode and with do_simu = True

Args:

self: the model Plugin runsubdir (str): working directory for the current run mode (str): forward or backward workdir (str): pyCIF working directory do_simu (bool): re-run or not existing simulation

Click below for a full example of the run function for the model CHIMERE.

pycif.plugins.models.chimere.run(self, runsubdir, mode, workdir, ddi, nbproc=1, do_simu=True, approx_transf=False, ref_fwd_dir='', overlap=False, **kwargs)[source]

Run the CHIMERE model in forward mode

Args:

self: the model Plugin runsubdir (str): working directory for the current run mode (str): forward or backward workdir (str): pyCIF working directory do_simu (bool): re-run or not existing simulation

ini_mapper#

The function ini_mapper defines the mapper giving meta-data about the model’s inputs and outputs. The ini_mapper function dedicated to a model has the same structure of the ini_mapper functions for the transform Plugins. Please consult the corresponding page for further details.

outputs2native and outputs2native_adj#

The functions outputs2native and outputs2native_adj read outputs and generate sensitivity to the outputs respectively.

pycif.plugins.models.template.outputs2native_adj(self, data2dump, input_type, di, df, runsubdir, mode='fwd', onlyinit=False, do_simu=True, check_transforms=False, **kwargs)[source]

Dumps and/or save information about outputs, so the model knows where to extract information.

In the present template, observations are simply saved for later use by outputs2native. If the model needs information to extract concentration on-the-fly, the information in data2dump should be used. In particular, the columns i and j are the row and columns of each observation in the domain. The column tstep indicates on which time stamp the observation spans, relative to what is indicated in the variable output_intervals in ini_mapper.

The function is called by loadfromoutputs.adjoint.

Args:

self: the model itself data2dump (dict): a dictionary with concentration data for each

component/tracer

input_type (str): the type of model outputs to be processed;

this information is redundant with the components of the data2dump dictionary

di (datetime.datetime): starting date of the present sub-simulation df (datetime.datetime): ending date of the present sub-simulation runsubdir (str): path to the present sub-simulation work directory mode (str): running mode; one of “fwd”, “tl” and “adj” onlyinit (bool): if True, means that the function is called during

the initialization process of the observation vector

do_simu (bool): if False, means that the observation vector

is retrieving information from a previous existing run; in that case, it may not be necessary to dump files

pycif.plugins.models.template.outputs2native(self, data2dump, input_type, di, df, runsubdir, mode='fwd', onlyinit=False, check_transforms=False, **kwargs)[source]

Reads outputs to pyCIF objects.

Args:

self: the model itself data2dump (dict): a dictionary with output data structure to be filled

with correct data for every component/tracer

input_type (str): the type of model outputs to be processed;

this information is redundant with the components of the data2dump dictionary

di (datetime.datetime): starting date of the present sub-simulation df (datetime.datetime): ending date of the present sub-simulation runsubdir (str): path to the present sub-simulation work directory mode (str): running mode; one of “fwd”, “tl” and “adj” onlyinit (bool): if True, means that the function is called during

the initialization process of the observation vector

do_simu (bool): if False, means that the observation vector

is retrieving information from a previous existing run; in that case, it may not be necessary to dump files

Return:

dict: a dictionary with structure the components/tracers to be extracted

Note:

The input data data2dump has a dictionary structure with two levels: component/tracer and date. This reads as:

data2dump = {
    (comp1, tracer1): {
        dd0: pd.DataFrame
        dd1: pd.DataFrame
        [...]
    }
}

In the output, the date level should be removed and only the outputs corresponding to the present simulation (di) should be included

native2inputs and native2inputs_adj#

The functions native2inputs and native2inputs_adj generate inputs for the model executable and reads sensitivity to the inputs as computed by the adjoint respectively.

pycif.plugins.models.template.native2inputs(self, datastore, input_type, datei, datef, runsubdir, mode='fwd', onlyinit=False, do_simu=True, check_transforms=False, **kwargs)[source]

Converts data at the model data resolution stored in datastore to model compatible input files.

Native2inputs will be called for every couple component/tracer as defined in the mapper

Args:

self: the model Plugin input_type (str): the component name to be treated; please note that

this information is redundant with the keys in datastore

datastore: data to convert datei, datef: date interval of the sub-simulation mode (str): running mode: one of ‘fwd’, ‘adj’ and ‘tl’ runsubdir (str): sub-directory for the current simulation workdir (str): the directory of the whole pyCIF simulation

Note:

The format of datastore is a mixture of the model mapper and of the data format as defined here

For each component/tracer, the data itself is stored in the key data, and all the other keys come from the mapper, in case they are useful to dump inputs at the correct format

Note:

If the input data was fully consistent with what the model expects, the data itself is not read by pyCIF. Instead, it is possible to directly link files defined by the key input_files (and defined in the fetch function of the corresponding flux plugin).

pycif.plugins.models.template.native2inputs_adj(self, datastore, input_type, datei, datef, runsubdir, mode='fwd', check_transforms=False, **kwargs)[source]

Read adjoint sensitivity and format them to pyCIF data format.

Warning

This function is used only when the adjoint of the model is available.

Args:

self: the model Plugin input_type (str): one of ‘flux’ datastore: data to convert

if input_type == ‘flux’,

datei, datef: date interval of the sub-simulation mode (str): running mode: one of ‘fwd’, ‘adj’ and ‘tl’ runsubdir (str): sub-directory for the current simulation workdir (str): the directory of the whole pyCIF simulation

make_auxiliary (optional)#

This function is called at the same time as native2inputs. It generates all required information or files that are not data coming from the datavect and included in the mapper, hence initialized by native2inputs.

Example of code:

pycif.plugins.models.template.make_auxiliary(self, ddi, runsubdir, do_simu=True, mode='fwd', **kwargs)[source]

Initialize every file or information needed by the model to run, excluding data that are initialized through the function native2inputs.

This includes name lists for Fortran, configuration files, etc.

Every basic files related to the model should be first initialized in self.workdir/model at the initialization step in the function compile.

Hereafter, files are link/copied to runsubdir from the reference ones in self.workdir/model

Note:

For configuration files, one should follow the following basic rules:

  • paths expected by the model should always point to the current runsubdir; thus the executable should be linked or copied in runsubdir; in addition, every extra file should be link with a fixed name and the corresponding name should be given in the name-list or configuration file.

  • as many model parameters should be easily modified through the yaml configuration file; however, for some reasons, it may be preferable to limit the possibilities for pyCIF by keeping some parameters fixed; this question is up to the developer implementing one model

Args:

self: the model plugin ddi (datetime.datetime): the start data identifying

the present simulation period

runsubdir (str): path to the current sub-simulation work directory do_simu (bool): if False, the simulation does not need to be run,

hence, in principle, no auxiliary data needs to be initialized

mode (str): the running mode to compute

compile (optional)#

The function copies, links or compiles the necessary model executables and, if needed, auxiliary configuration files

pycif.plugins.models.template.compile(self)[source]

The compile function initializes all model information and executables prior to any run. Files must be copied in $workdir/model.

This includes:

  • copying executable if exist

    Warning

    It is recommended to copy executable files to make sure than later simulations in the present pyCIF computation use the same executable. Indeed, it can happen that one runs very long inversions in the background and carries on developments, forgetting about the background inversions, thus potentially breaking the background inversions, or worse, changing the result without error…

  • copy sources and compile if no executable is around, or if explicitly required to re-compile.

    Note

    As much as possible, the model should be compiled within pyCIF to guarantee a traceability of the options used for compiling and also dealing with platform specificities through the platform Plugin (see details here)

  • copy extra configuration files, e.g., templates for namelists, etc.

flushrun (optional)#

The function flushrun is called at the end of a simulations. It cleans all temporary files that take disk space and are not necessary afterwards.

Arguments are:

self:

the model itself

rundir:

the run directory (with all the sub-period simulations)

mode:

the running mode; one of fwd, tl or adj.

The function returns nothing.

Click below for a full example of the flushrun function for the model CHIMERE.

pycif.plugins.models.chimere.flushrun(self, rundir, mode, transform_id, full_flush=True)[source]

Cleaning the simulation directories to limit space usage