3. Elaborate the yaml for the CIF, using ready-made files¶
How to use the cheat-sheet for plugins
In the following, plugins have to be used and provided specifications. The arguments can be found in the documentation of each plugin. To make access to the plugins easier, the cheat-sheet shows them sorted by type: the various types are the left-most (e.g. chemistry, controlvect, fields). For each type, available plugins are listed with the name, version of each displayed. Note that stating the name and version of a plugin is mandatory, whereas stating its type not always necessary.
This section must contain the five arguments shown in the example:
verbosegives the degree of verbosity of the CIF, with 1 for basic information and 2 for debugging
workdiris the working directory, which will be created by the CIF and used for executing and storing all the relevant inputs and outputs. Chose somewhere with enough disk space.
logfileis the name of the file where the logs of the CIF are written. This file is to be saved in the
datefare the initial and final dates of the period to simulate. Use the following compatible format for the date: YYYY-mm-dd or YYYY-mm-dd HH:mm:ss
verbose: 2 logfile: pycif.logtest workdir: /tmp/CIF//.tox/py38/tmp/fwd_ref_chimere datei: 2011-03-22 00:00:00 datef: 2011-03-22 09:00:00
In this section of the yaml, it is possible to define anchors to be used in the rest of the file.
Here, a forward simulation is the chosen mode for running the model.
At the key-word for the class (
mode), the various available plugins are listed in the cheat-sheet.
For the chosen plugin, here the one for running a forward simulation,
the name and version of the plugin are provided and the requirements are listed.
The full description of the class mode gives access to arguments.
For forward, there is no mandatory argument to specify but a few optional arguments can be used; the template yaml at the end of the page provides a full list of them.
In our example below, only
reload_results is used so as not to have to recompute the whole simulation in case of an interruption.
mode: plugin: name: forward version: std reload_results: true
Our chemistry-transport model works from the flux space to the concentration space, which corresponds to the standard choice of obsoperator. For this standard obsoperator, there is no mandatory argument to specify but a few optional arguments can be used,
as shown in the full template yaml. In our example,
obsoperator: plugin: name: standard version: std autorestart: True
The requirements for our standard obsperator are
So far, there is only the standard possibility for controlvect. For this standard controlvect, there is no mandatory argument to specify but a few optional arguments can be used. In our example, no optional argument is activated (the default values will apply).
controlvect: plugin: name: standard version: std
The requirements for the standard controlvect are
Here, it is the plugin for CHIMERE. The usual user’s choices for running a CHIMERE simulation (see CHIMERE documentation) are either in the mandatory arguments or in the optional arguments, for which default values are specified. Be sure to check all the mandatory AND OPTIONAL arguments to fully set up the simulation as wanted. It must be consistent with e.g. the chemistry (see sections Locate the input files provided directly for CHIMERE and Chemistry (chemistry)) and domain (see sections Locate the input files provided directly for CHIMERE and Domain (domain)).
model: auto-recompile: true dir_sources: /tmp/CIF//model_sources/chimereGES direxec: /tmp/PYCIF_DATA_TEST/CHIMERE/CHIMERE_executables ichemstep: 1 ideepconv: 0 nivout: 17 nlevemis: 17 nmdoms: 1 nphour_ref: 6 nzdoms: 1 periods: 3H plugin: name: CHIMERE version: std usechemistry: 1 usedepos: 1 usewetdepos: 1
The requirements for CHIMERE are
chemistry and a set of components
(corresponding to the inputs of CHIMERE itself) to be detailed in
To avoid useless runs, the CIF only runs a simulation up to the time where observations are available. The standard obsvect must therefore be initialized. See section Component for observations for how to provide quick-dummy observation data.
obsvect: dump: true plugin: name: standard version: std
Its requirements are
datavect and the
To specify the computing platform on which to run,
so that the CIF can chose the right configuration and perform targeted operations
module load the relevant modules.
Here the example is set at LSCE, on the obelix cluster.
platform: plugin: name: LSCE version: obelix
The only requirement is the
Specify a domain for CHIMERE
(see also the cheat-sheet) consistently
with the pre-computed input files (see step 2).
The files defining the domain can be stored directly in directory
repgrid or symbolic links can be used.
domain : plugin: name : CHIMERE version : std repgrid: a_path_for_CHIMERE_COORD_definition_files/ domid : MYDOMAIN nlev: 20 p1: 997 pmax: 200 pressure_unit: hPa
The only available type of chemical schemes so far is for photolysis with tabulated Js, the chemical scheme being pre-computed (see step 2).
chemistry : plugin: name: CHIMERE version: gasJtab schemeid: name.chemistry dir_precomp: the_path_for_the_directory_of_which_chemical_scheme_named_above_is_a_subdir/
The data vector contains ingredients, which list the input data for the model (e.g. emission fluxes)
and for the comparison to observations (e.g. concentration data) ,
controlvect, obsoperator and
obsvect will use for building the set-up to run.
So far, there is only the standard
datavect : plugin: name: stanbard version: std
For the first forward simulation, its components are the requirements of the model’s plugin which are not already taken care of in the previous sections of the yaml (i.e. excluding
chemistry) plus a component for the (dummy) observation data.
The components which are used to provide the model with its inputs are to be chosen among the available datastreams,
which are recognized by the model’s plugin
so that it is able to pre-process or simply fetch its inputs.
For example, for CHIMERE, the plugin expects
flux to provide the information on the input emissions
not to be interpolated within the hour i.e. emissions to put into the AEMISSIONS file,
For each of its datastream components,
datavect expects a minimum of three pieces of information:
dir, for the directory where the data relevant to this component is available
filewhich gives either a fixed file name or a general format for a set of files (with generic year, month, day, hour, etc).
a mandatory (except for initial conditions)
file_freq, which gives the time period covered by each data file. Use pandas format for these duration e.g. 1D, 120H, etc.
To distinguish the various boundary conditions (lateral, top, initial), the
comp_type must also be specified.
Note that these arguments are linked to datavect and not to a given plugin (see also optional general arguments for datavect here).
When a plugin is used by a component to deal with the specified files, its name, version AND type must be sepcified, as well as its own arguments.
The datastream components dealing with the required inputs of the model direct to plugins which are able to deal with the inputs required by CHIMERE: meteo (I) for the meteorological inputs, emission fluxes (II and III), boundary conditions (IV) and initial conditions (IV).
With pre-computed METEO.nc files, the specifications for the
meteocomponent are very simple:
Show/Hide Codedatavect: components: meteo: dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK file: 'METEO.%Y%m%d%H.3.nc' file_freq: 3H plugin: name: CHIMERE version: std
In this case, the names of the METEO.nc files match the format used by CHIMERE. It is also possible to vary the template e.g.
With pre-computed files also available for fluxes, biofluxes, inicond, latcond and topcond, they can be specified in the same simple way.
The pre-computed files must be consistent together and with the domain, the PyCIF parameters of the simulation and the choices made for the model, particularly with the optional argument
periods(default 1D = 24 hours).
In the same manner as for the meteorology, it it simple to specify the use of pre-computed AEMISSIONS.nc files with ready-made
filedealt with by the plugin for CHIMERE’s fluxes, with its type, name and version taken from the cheat-sheet:
Show/Hide Codedatavect: plugin: name: standard version: std components: meteo: dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files file: METEO.%Y%m%d%H.X.nc plugin: name: CHIMERE version: std type: meteo file_freq: XH flux: dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files file: AEMISSIONS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: AEMISSIONS type: flux file_freq: XH
In this case, the CIF expects to find AEMISSIONS.nc files formatted exactly as CHIMERE uses them and containing all the species listed in ANTHROPIC.
It is also possible to combine different ready-made files for the various emitted species which do not require a sub-hourly interpolation. These species must be listed as
fluxand their names must match the names in ANTHROPIC. For each parameter, it is possible to individualize everything, as shown in the tutorial for more elaborated inputs.
biofluxspecifications follow the same principles as
flux. If no fluxes with a sub-hourly interpolation are required (
useemisbis False in
model), this component can be omitted. If it is omitted while
useemisbis True, an error is raised.
In the same manner as for AEMISSIONS, it is simple to specify the use of pre-computed BEMISSIONS.nc files, making use of the plugin for CHIMERE’s fluxes:
Show/Hide Codedatavect: plugin: name: standard version: std components: meteo: dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files file: METEO.%Y%m%d%H.X.nc plugin: name: CHIMERE version: std type: meteo file_freq: XH flux: dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files file: AEMISSIONS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: AEMISSIONS type: flux file_freq: XH bioflux: dir: directory_containing_BEMISSIONS.YYYYMMDDHH.*.nc_files file: BEMISSIONS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: AEMISSIONS type: flux emis_type: bio file_freq: XH
Note that the
emis_typemust here be explicit so that BEMISSIONS files are fetched (and not AEMISSIONS files). If combining various files for different parameters, their names must match BIOGENIC.
topcondcomponents are characterized by their
comp_type; otherwise, their specifications follow the same principles as
Show/Hide Codedatavect: plugin: name: standard version: std components: meteo: dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files file: METEO.%Y%m%d%H.X.nc plugin: name: CHIMERE version: std type: meteo file_freq: XH flux: dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files file: AEMISSIONS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: AEMISSIONS type: flux file_freq: XH inicond: dir: directory_containing_IC.YYYYMMDDHH.*.nc_files file: INI_CONCS.0.nc XX mandatory name???XXXX plugin: name: CHIMERE version: icbc type: field comp_type: inicond latcond: dir: directory_containing_BC.YYYYMMDDHH.*.nc_files file: BOUN_CONCS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: icbc type: field file_freq: XH comp_type: latcond topcond: dir: directory_containing_BC.YYYYMMDDHH.*.nc_files file: BOUN_CONCS.%Y%m%d%H.X.nc plugin: name: CHIMERE version: icbc type: field file_freq: XH comp_type: topcond
There are three options to compute CHIMERE outputs.
Force the computation of the observation operator without observations: XXX under-constructionwith force_full_operator?XXX
2. Generate random observations: this is done in the yaml by specifying information in the yaml to generate random surface measurements of a set of parameters with plugin measurements. For example, for one measured species only:
concs: parameters: S1: plugin: name: random type: measurements version: param frequency: '1H' nstations: 5 duration: '1H' random_subperiod_shift: True zmax: 100 seed: True
Make your own observation file
The component named
concs is used for surface data, other types are also available, as described in the standard data vector.
Here, an example is given for surface observations with the matching yaml and a python code to generate a monitor.nc file with one observation.
concs: parameters: S1: dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK file: dummy_monitor.nc
The parameters’s names must be in the ACTIVE_SPECIES file.
To simply run a forward simulation, the dummy monitor file can be filled-in with one observation dated at the final hour of the period. This can be done based on the following short python script:
import pandas as pd import datetime # Put here the elements of datef yearf = 2011 monthf = 2 dayf = 3 hourf = 0 minutef = 0 secondf = 0 # Put here the coordinates of any point in the domain (see repgrid in domain, file HCOORD) lat0 = 1.2 lon0 = 48.3 list_basic_cols = [ 'date', 'duration', 'station' , 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror', 'alt' ] datef = datetime.datetime( year = yearf, month = monthf , day = dayf , hour = hourf, minute = minutef, second = secondf) data = pd.DataFrame( columns = list_basic_cols ) data['date'] = [ datef ] data = data.assign(duration=1.) data = data.assign(station='dummy') data = data.assign(network='none') data = data.assign(parameter='NONE') data = data.assign(lon = lon0) data = data.assign(lat = lat0) data = data.assign(obs = 500. ) data = data.assign(obserror = 5.) data = data.assign(alt =1.) data = data.to_xarray() data.to_netcdf('monitor_obs_for_simu_ID.nc')
when no observation is available, no error is raised, the CIF indicates that the forward mode has been successfully executed - even though CHIMERE did not actually run.