Models model
#
Available Models model
#
The following models
are implemented in pyCIF so far:
- CHIMERE/std
CHIMERE/std
- CHIMERE with OpenACC annotations
CHIMERE/acc
- ICOsahedral Nonhydrostatic weather- and climate model with Aerosols and Reactive Trace gases
ICON-ART/std
- LMDZ/std
LMDZ/std
- LMDz with OpenACC annotations
LMDZ/acc
- Lagrangian/std
Lagrangian/std
- TM5/std
TM5/std
- Template for model implementation
template/std
- dummy/std
dummy/std
- wrfchem/std
wrfchem/std
Documentation#
Description#
The model
class runs chemistry-transport models, process their outputs
and generates their inputs.
Please note that models are often computed with high-performance languages
such as Fortran or C.
In these case, the sources are included in the directory model_sources
provided alongside pyCIF.
Required parameters, dependencies and functions#
The following attributes, dependencies and functions should be defined
for any model
, as they are called by other plugins.
They can be parameters to define at the set-up step,
functions to implement in the corresponding module,
or dependencies to be attached to the model
class.
Parameters and attributes#
Initialization parameters#
The following attributes are defined once for all at the initialization of the model, they inform pyCIF about the temporal resolution of the model. All the following objects are filled with datetime.datetime objects. To make the handling of lists easier, pyCIF requires lists to be implemented as numpy.array
Below, only subsimu_dates
is mandatory, the others are recommended to be
called elsewhere, in particular for the definition of the mapper
.
subsimu_dates
:the list of simulation periods if the model simulation window is split into shorter sub-periods
tstep_dates
:the time-steps at which the model carries out its numerical computations; this argument is used by pyCIF to determine which observation to compare to what model time step. The shape of this argument is a dictionary, whose keys are
subsimu_dates
and entries are the lists of time-steps corresponding to each sub-period.tstep_all
:the same as
tstep_dates
; the difference is thattstep_all
is a list containingall
time steps of all simulation sub-periods instead of a dictionary split into sub-periodsinput_dates
:dates at which the model expects some inputs; has the same shape as
tstep_dates
Please find below an illustration of the different time steps:
digraph { tbl [ shape=plaintext label=< <table border='0' cellborder='1' color='blue' cellspacing='0' width="500"> <tr><td></td><td>1st sub-period</td><td>2nd sub-period</td></tr> <tr> <td>Global time scale</td> <td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td><td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">7 </td><td width="30">8 </td><td width="30">9</td><td width="30">10 </td><td width="30">11 </td><td width="30">12</td></tr> </table> </td> </tr> <tr> <td>Local time scale</td> <td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td><td cellpadding='6'> <table color='orange' cellspacing='0' width="180" cellpadding="0"> <tr><td width="30">1 </td><td width="30">2 </td><td width="30">3</td><td width="30">4 </td><td width="30">5 </td><td width="30">6</td></tr> </table> </td> </tr> <tr> <td>Observation 1</td> <td colspan="2" style="padding: 40px 10px 5px 5px;"> | sampling period | </td> </tr> <tr> <td>Observation 2</td> <td colspan="2" style="padding: 40px 10px 5px 5px;"> | sampling period | </td> </tr> </table> >]; }In the example, the model is run between January 1st, 2010 to February 28th, 2010. Computations are carried out every hours and inputs are expected every 3 hours. In that case, the temporal variables are:
import numpy as np
subsimu_dates = np.array([datetime.datetime(2010, 1, 1), datetime.datetime(2010, 2, 1)])
tstep_dates = {
datetime.datetime(2010, 1, 1): np.array(
[datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 1),
..., datetime.datetime(2010, 1, 31, 23)]),
datetime.datetime(2010, 2, 1): np.array(
[datetime.datetime(2010, 2, 1, 0), datetime.datetime(2010, 2, 1, 1),
..., datetime.datetime(2010, 2, 28, 23)]),
}
tstep_all = np.array([
datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 1),
..., datetime.datetime(2010, 2, 28, 23)
])
input_dates = {
datetime.datetime(2010, 1, 1): np.array(
[datetime.datetime(2010, 1, 1, 0), datetime.datetime(2010, 1, 1, 3),
..., datetime.datetime(2010, 1, 31, 21)]),
datetime.datetime(2010, 2, 1): np.array(
[datetime.datetime(2010, 2, 1, 0), datetime.datetime(2010, 2, 1, 3),
..., datetime.datetime(2010, 2, 28, 21)]),
}
Online parameters#
The following variables are defined online during the computation of the model.
chain
:for a given model simulation, files from previous sub-periods necessary to run following sub-periods are stored in
current_sim_directory/chain
; thechain
variable stores the date of the previous sub-period that was computed; the variable is automatically updated by theobsoperator
, but the files should be moved by the functionrun
of the model.adj_refdir
:this is the directory where forward simulations corresponding to the adjoint being run are stored; the variable should be updated when running a forward in the
run
function.
Dependencies#
Some other classes in pyCIF expect the model
class to have
a domain
class attached to it, describing the model domain.
This way, model.domain
can be called.
Functions#
The following functions need to be implemented in any model to make it interact with other classes.
They must be imported at the root level of the corresponding python package,
i.e. in the __init__.py
file:
from XXXXX import ini_periods
from XXXXX import run
from XXXXX import make_auxiliary
from XXXXX import native2inputs
from XXXXX import native2inputs_adj
from XXXXX import outputs2native
from XXXXX import outputs2native_adj
from XXXXX import compile
from XXXXX import ini_mapper
It is recommended to include each function in a separate file to avoid very long scripts.
ini_periods (optional)#
- pycif.plugins.models.template.ini_periods(self, **kwargs)[source]
The function
ini_periods
is optional but very recommended. It is used to define the temporal variablessubsimu_dates
,input_dates
,tstep_dates
andtstep_all
. The function is automatically called at the initialization of themodel
class if available. If not available, the temporal variables should be defined manually in theini_data
function (not recommended).ini_periods
is a class method that applies to themodel
plugin itself. Therefore, the only expected argument isself
.def ini_periods(self, **kwargs): self.subsimu_dates = XXXX self.tstep_dates = XXXXX self.input_dates = XXXXX self.tstep_all = XXXXX
Click below to see an example of the ini_periods
function for the model CHIMERE.
- pycif.plugins.models.chimere.ini_periods(self, **kwargs)[source]
run#
The function run
executes the model itself.
As models are often computationally expensive to run, they are not written in python.
Therefore, the run
function calls an external executable compiled previously.
There are several ways to call system executables in python. We recommend using the function subprocess.Popen for that purpose. It gives flexibility in logging and can capture errors during the execution of the external executable.
Other tasks carried out by the run
function are:
update the variable
self.adj_refdir
for later adjoint simulations- update the variable
self.chain
for later sub-periods and move necessary files to that directory; these files include for instance concentration fields at the last time step of the period, to be used as initial conditions for the next period.
- update the variable
- pycif.plugins.models.template.run(self, runsubdir, mode, workdir, ddi, nbproc=1, do_simu=True, approx_transf=False, ref_fwd_dir='', overlap=False, **kwargs)[source]
Run the model in forward, tangent-linear or adjoint mode. This includes:
executing the model external executable
updating
adj_refdir
moving files needed for chained simulations to “{}/../”.format(runsubdir)
- Note:
For model for which the adjoint is not coded, make sure to return a clear error if the
run
function is called inadj
mode and withdo_simu = True
- Args:
self: the model Plugin runsubdir (str): working directory for the current run mode (str): forward or backward workdir (str): pyCIF working directory do_simu (bool): re-run or not existing simulation
Click below for a full example of the run
function for the model CHIMERE.
- pycif.plugins.models.chimere.run(self, runsubdir, mode, workdir, ddi, nbproc=1, do_simu=True, approx_transf=False, ref_fwd_dir='', overlap=False, **kwargs)[source]
Run the CHIMERE model in forward mode
- Args:
self: the model Plugin runsubdir (str): working directory for the current run mode (str): forward or backward workdir (str): pyCIF working directory do_simu (bool): re-run or not existing simulation
ini_mapper#
The function ini_mapper
defines the mapper
giving meta-data about
the model’s inputs and outputs.
The ini_mapper
function dedicated to a model has the same structure
of the ini_mapper
functions for the transform
Plugins.
Please consult the corresponding page
for further details.
outputs2native and outputs2native_adj#
The functions outputs2native
and outputs2native_adj
read outputs and generate sensitivity to the outputs respectively.
- pycif.plugins.models.template.outputs2native_adj(self, data2dump, input_type, di, df, runsubdir, mode='fwd', onlyinit=False, do_simu=True, check_transforms=False, **kwargs)[source]
Dumps and/or save information about outputs, so the model knows where to extract information.
In the present template, observations are simply saved for later use by
outputs2native
. If the model needs information to extract concentration on-the-fly, the information indata2dump
should be used. In particular, the columnsi
andj
are the row and columns of each observation in the domain. The columntstep
indicates on which time stamp the observation spans, relative to what is indicated in the variableoutput_intervals
inini_mapper
.The function is called by
loadfromoutputs.adjoint
.- Args:
self: the model itself data2dump (dict): a dictionary with concentration data for each
component/tracer
- input_type (str): the type of model outputs to be processed;
this information is redundant with the components of the data2dump dictionary
di (datetime.datetime): starting date of the present sub-simulation df (datetime.datetime): ending date of the present sub-simulation runsubdir (str): path to the present sub-simulation work directory mode (str): running mode; one of “fwd”, “tl” and “adj” onlyinit (bool): if
True
, means that the function is called duringthe initialization process of the observation vector
- do_simu (bool): if
False
, means that the observation vector is retrieving information from a previous existing run; in that case, it may not be necessary to dump files
- pycif.plugins.models.template.outputs2native(self, data2dump, input_type, di, df, runsubdir, mode='fwd', onlyinit=False, check_transforms=False, **kwargs)[source]
Reads outputs to pyCIF objects.
- Args:
self: the model itself data2dump (dict): a dictionary with output data structure to be filled
with correct data for every component/tracer
- input_type (str): the type of model outputs to be processed;
this information is redundant with the components of the data2dump dictionary
di (datetime.datetime): starting date of the present sub-simulation df (datetime.datetime): ending date of the present sub-simulation runsubdir (str): path to the present sub-simulation work directory mode (str): running mode; one of “fwd”, “tl” and “adj” onlyinit (bool): if
True
, means that the function is called duringthe initialization process of the observation vector
- do_simu (bool): if
False
, means that the observation vector is retrieving information from a previous existing run; in that case, it may not be necessary to dump files
- Return:
dict: a dictionary with structure the components/tracers to be extracted
Note:
The input data
data2dump
has a dictionary structure with two levels: component/tracer and date. This reads as:data2dump = { (comp1, tracer1): { dd0: pd.DataFrame dd1: pd.DataFrame [...] } }
In the output, the date level should be removed and only the outputs corresponding to the present simulation (
di
) should be included
native2inputs and native2inputs_adj#
The functions native2inputs
and native2inputs_adj
generate inputs for the model executable
and reads sensitivity to the inputs as computed by the adjoint respectively.
- pycif.plugins.models.template.native2inputs(self, datastore, input_type, datei, datef, runsubdir, mode='fwd', onlyinit=False, do_simu=True, check_transforms=False, **kwargs)[source]
Converts data at the model data resolution stored in
datastore
to model compatible input files.Native2inputs will be called for every couple
component/tracer
as defined in themapper
- Args:
self: the model Plugin input_type (str): the
component
name to be treated; please note thatthis information is redundant with the keys in
datastore
datastore: data to convert datei, datef: date interval of the sub-simulation mode (str): running mode: one of ‘fwd’, ‘adj’ and ‘tl’ runsubdir (str): sub-directory for the current simulation workdir (str): the directory of the whole pyCIF simulation
Note:
The format of
datastore
is a mixture of the modelmapper
and of the data format as defined hereFor each component/tracer, the data itself is stored in the key
data
, and all the other keys come from themapper
, in case they are useful to dump inputs at the correct formatNote:
If the input data was fully consistent with what the model expects, the data itself is not read by pyCIF. Instead, it is possible to directly link files defined by the key
input_files
(and defined in thefetch
function of the correspondingflux
plugin).
- pycif.plugins.models.template.native2inputs_adj(self, datastore, input_type, datei, datef, runsubdir, mode='fwd', check_transforms=False, **kwargs)[source]
Read adjoint sensitivity and format them to pyCIF data format.
Warning
This function is used only when the adjoint of the model is available.
- Args:
self: the model Plugin input_type (str): one of ‘flux’ datastore: data to convert
if input_type == ‘flux’,
datei, datef: date interval of the sub-simulation mode (str): running mode: one of ‘fwd’, ‘adj’ and ‘tl’ runsubdir (str): sub-directory for the current simulation workdir (str): the directory of the whole pyCIF simulation
make_auxiliary (optional)#
This function is called at the same time as native2inputs
.
It generates all required information or files that are not data
coming from the datavect
and included in the mapper
,
hence initialized by native2inputs
.
Example of code:
- pycif.plugins.models.template.make_auxiliary(self, ddi, runsubdir, do_simu=True, mode='fwd', **kwargs)[source]
Initialize every file or information needed by the model to run, excluding data that are initialized through the function
native2inputs
.This includes name lists for Fortran, configuration files, etc.
Every basic files related to the model should be first initialized in
self.workdir/model
at the initialization step in the functioncompile
.Hereafter, files are link/copied to
runsubdir
from the reference ones inself.workdir/model
- Note:
For configuration files, one should follow the following basic rules:
paths expected by the model should always point to the current
runsubdir
; thus the executable should be linked or copied inrunsubdir
; in addition, every extra file should be link with a fixed name and the corresponding name should be given in the name-list or configuration file.as many model parameters should be easily modified through the yaml configuration file; however, for some reasons, it may be preferable to limit the possibilities for pyCIF by keeping some parameters fixed; this question is up to the developer implementing one model
- Args:
self: the model plugin ddi (datetime.datetime): the start data identifying
the present simulation period
runsubdir (str): path to the current sub-simulation work directory do_simu (bool): if False, the simulation does not need to be run,
hence, in principle, no auxiliary data needs to be initialized
mode (str): the running mode to compute
compile (optional)#
The function copies, links or compiles the necessary model executables and, if needed, auxiliary configuration files
- pycif.plugins.models.template.compile(self)[source]
The
compile
function initializes all model information and executables prior to any run. Files must be copied in$workdir/model
.This includes:
copying executable if exist
Warning
It is recommended to copy executable files to make sure than later simulations in the present pyCIF computation use the same executable. Indeed, it can happen that one runs very long inversions in the background and carries on developments, forgetting about the background inversions, thus potentially breaking the background inversions, or worse, changing the result without error…
copy sources and compile if no executable is around, or if explicitly required to re-compile.
Note
As much as possible, the model should be compiled within pyCIF to guarantee a traceability of the options used for compiling and also dealing with
platform
specificities through theplatform
Plugin (see details here)copy extra configuration files, e.g., templates for namelists, etc.
flushrun (optional)#
The function flushrun
is called at the end of a simulations.
It cleans all temporary files that take disk space and are not necessary afterwards.
Arguments are:
- self:
the model itself
- rundir:
the run directory (with all the sub-period simulations)
- mode:
the running mode; one of
fwd
,tl
oradj
.
The function returns nothing.
Click below for a full example of the flushrun
function for the model CHIMERE.
- pycif.plugins.models.chimere.flushrun(self, rundir, mode, transform_id, full_flush=True)[source]
Cleaning the simulation directories to limit space usage