3. Elaborate the yaml for the CIF, using ready-made files

Important

How to use the cheat-sheet for plugins

In the following, plugins have to be used and provided specifications. The arguments can be found in the documentation of each plugin. To make access to the plugins easier, the cheat-sheet shows them sorted by type: the various types are the left-most (e.g. chemistry, controlvect, fields). For each type, available plugins are listed with the name, version of each displayed. Note that stating the name and version of a plugin is mandatory, whereas stating its type not always necessary.

3.1. Section for PyCIF parameters:

This section must contain the five arguments shown in the example:

  • verbose gives the degree of verbosity of the CIF, with 1 for basic information and 2 for debugging

  • workdir is the working directory, which will be created by the CIF and used for executing and storing all the relevant inputs and outputs. Chose somewhere with enough disk space.

  • logfile is the name of the file where the logs of the CIF are written. This file is to be saved in the workdir.

  • datei and datef are the initial and final dates of the period to simulate. Use the following compatible format for the date: YYYY-mm-dd or YYYY-mm-dd HH:mm:ss

Show/Hide Code

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

verbose: 2
logfile: pycif.logtest
workdir: /tmp/CIF//.tox/py38/tmp/fwd_ref_chimere
datei: 2011-03-22 00:00:00
datef: 2011-03-22 09:00:00

In this section of the yaml, it is possible to define anchors to be used in the rest of the file.

3.2. Mode (mode)

Here, a forward simulation is the chosen mode for running the model. At the key-word for the class (mode), the various available plugins are listed in the cheat-sheet. For the chosen plugin, here the one for running a forward simulation, the name and version of the plugin are provided and the requirements are listed. The full description of the class mode gives access to arguments. For forward, there is no mandatory argument to specify but a few optional arguments can be used; the template yaml at the end of the page provides a full list of them. In our example below, only reload_results is used so as not to have to recompute the whole simulation in case of an interruption.

Show/Hide Code

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/modes/forward.html

mode:
  plugin:
    name: forward
    version: std

  reload_results: true

The requirements for our forward mode are Observation operator (obsoperator) and Control vector (controlvect). They are to be specified in the next sections of the yaml file.

3.3. Observation operator (obsoperator)

Our chemistry-transport model works from the flux space to the concentration space, which corresponds to the standard choice of obsoperator. For this standard obsoperator, there is no mandatory argument to specify but a few optional arguments can be used,

as shown in the full template yaml. In our example, autorestart is used.

Show/Hide Code

obsoperator:
  plugin:
    name: standard
    version: std
  autorestart: True

The requirements for our standard obsperator are controlvect, datavect, model, obsvect and platform.

3.4. Control vector (controlvect)

So far, there is only the standard possibility for controlvect. For this standard controlvect, there is no mandatory argument to specify but a few optional arguments can be used. In our example, no optional argument is activated (the default values will apply).

Show/Hide Code

controlvect:
  plugin:
    name: standard
    version: std

The requirements for the standard controlvect are datavect, domain, model and obsvect.

3.5. Model (model)

Here, it is the plugin for CHIMERE. The usual user’s choices for running a CHIMERE simulation (see CHIMERE documentation) are either in the mandatory arguments or in the optional arguments, for which default values are specified. Be sure to check all the mandatory AND OPTIONAL arguments to fully set up the simulation as wanted. It must be consistent with e.g. the chemistry (see sections Locate the input files provided directly for CHIMERE and Chemistry (chemistry)) and domain (see sections Locate the input files provided directly for CHIMERE and Domain (domain)).

Show/Hide Code

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/models/chimere.html

model:
  plugin:
    name: CHIMERE
    version: std

  auto-recompile: true
  dir_sources: /tmp/CIF//model_sources/chimereGES
  direxec: /tmp/PYCIF_DATA_TEST/CHIMERE/CHIMERE_executables
  ichemstep: 1
  ideepconv: 0
  nivout: 17
  nlevemis: 17
  nmdoms: 1
  nphour_ref: 6
  nzdoms: 1
  periods: 3H
  usechemistry: 1
  usedepos: 1
  usewetdepos: 1

The requirements for CHIMERE are domain, chemistry and a set of components (corresponding to the inputs of CHIMERE itself) to be detailed in datavect: meteo, flux, bioflux, latcond, topcond and inicond.

3.6. Observation Vector (obsvect)

To avoid useless runs, the CIF only runs a simulation up to the time where observations are available. The standard obsvect must therefore be initialized. See section Component for observations for how to provide quick-dummy observation data.

Show/Hide Code

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/obsvects/standard.html

obsvect:
  plugin:
    name: standard
    version: std

  dump: true

Its requirements are datavect and the model.

3.7. Platform (platform)

To specify the computing platform on which to run, so that the CIF can chose the right configuration and perform targeted operations e.g. module load the relevant modules. Here the example is set at LSCE, on the obelix cluster.

Show/Hide Code

platform:
  plugin:
    name: LSCE
    version: obelix

The only requirement is the model.

3.8. Domain (domain)

Specify a domain for CHIMERE (see also the cheat-sheet) consistently with the pre-computed input files (see step 2). The files defining the domain can be stored directly in directory repgrid or symbolic links can be used.

Show/Hide Code

domain :
  plugin:
    name    : CHIMERE
    version : std
  repgrid: a_path_for_CHIMERE_COORD_definition_files/
  domid : MYDOMAIN
  nlev: 20
  p1: 997
  pmax: 200
  pressure_unit: hPa

3.9. Chemistry (chemistry)

The only available type of chemical schemes so far is for photolysis with tabulated Js, the chemical scheme being pre-computed (see step 2).

Show/Hide Code

chemistry :
  plugin:
    name: CHIMERE
    version: gasJtab
  schemeid: name.chemistry
  dir_precomp: the_path_for_the_directory_of_which_chemical_scheme_named_above_is_a_subdir/

3.10. Data vector (datavect)

The data vector contains ingredients, which list the input data for the model (e.g. emission fluxes) and for the comparison to observations (e.g. concentration data) , which controlvect, obsoperator and obsvect will use for building the set-up to run.

So far, there is only the standard datavect.

Show/Hide Code

datavect :
  plugin:
    name: stanbard
    version: std

For the first forward simulation, its components are the requirements of the model’s plugin which are not already taken care of in the previous sections of the yaml (i.e. excluding domain and chemistry) plus a component for the (dummy) observation data.

3.10.1. CHIMERE usual inputs

The components which are used to provide the model with its inputs are to be chosen among the available datastreams, which are recognized by the model’s plugin so that it is able to pre-process or simply fetch its inputs. For example, for CHIMERE, the plugin expects flux to provide the information on the input emissions not to be interpolated within the hour i.e. emissions to put into the AEMISSIONS file,

For each of its datastream components, datavect expects a minimum of three pieces of information:

  1. a mandatory dir, for the directory where the data relevant to this component is available

  2. a mandatory file which gives either a fixed file name or a general format for a set of files (with generic year, month, day, hour, etc).

  3. a mandatory (except for initial conditions) file_freq, which gives the time period covered by each data file. Use pandas format for these duration e.g. 1D, 120H, etc.

To distinguish the various boundary conditions (lateral, top, initial), the comp_type must also be specified.

Note that these arguments are linked to datavect and not to a given plugin (see also optional general arguments for datavect here).

When a plugin is used by a component to deal with the specified files, its name, version AND type must be sepcified, as well as its own arguments.

The datastream components dealing with the required inputs of the model direct to plugins which are able to deal with the inputs required by CHIMERE: meteo (I) for the meteorological inputs, emission fluxes (II and III), boundary conditions (IV) and initial conditions (IV).

  1. With pre-computed METEO.nc files, the specifications for the meteo component are very simple:

    1. the minimum information dir and file which direct to ready-made METEO.nc files, as well as the matching file_freq.

    2. the plugin for CHIMERE’s meteo ready-made files (see also cheat-sheet) which deals with these files

    Show/Hide Code

    #####################
    # pyCIF config file #
    #####################
    
    # Define here all parameters for pyCIF following YAML syntax
    # For details on YAML syntax, please see:
    # http://docs.ansible.com/ansible/latest/YAMLSyntax.html
    
    ###############################################################################
    # pyCIF parameters
    
    ###############################################################################
    
    datavect:
      components:
        meteo:
          plugin:
            name: CHIMERE
            version: std
          dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK
          file: 'METEO.%Y%m%d%H.3.nc'
          file_freq: 3H
    

    In this case, the names of the METEO.nc files match the format used by CHIMERE. It is also possible to vary the template e.g. file: METEO_some_etiket.%Y%m%d%H.X.nc

  2. With pre-computed files also available for fluxes, biofluxes, inicond, latcond and topcond, they can be specified in the same simple way.

    Note

    The pre-computed files must be consistent together and with the domain, the PyCIF parameters of the simulation and the choices made for the model, particularly with the optional argument periods (default 1D = 24 hours).

    In the same manner as for the meteorology, it it simple to specify the use of pre-computed AEMISSIONS.nc files with ready-made dir and file dealt with by the plugin for CHIMERE’s fluxes, with its type, name and version taken from the cheat-sheet:

    Show/Hide Code

    datavect:
      plugin:
        name: standard
        version: std
      components:
        meteo:
          dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
          file: METEO.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: std
            type: meteo
          file_freq: XH
        flux:
          dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
          file: AEMISSIONS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: AEMISSIONS
            type: flux
          file_freq: XH
    

    In this case, the CIF expects to find AEMISSIONS.nc files formatted exactly as CHIMERE uses them and containing all the species listed in ANTHROPIC.

    It is also possible to combine different ready-made files for the various emitted species which do not require a sub-hourly interpolation. These species must be listed as parameters of flux and their names must match the names in ANTHROPIC. For each parameter, it is possible to individualize everything, as shown in the tutorial for more elaborated inputs.

  3. The bioflux specifications follow the same principles as flux. If no fluxes with a sub-hourly interpolation are required (useemisb is False in model), this component can be omitted. If it is omitted while useemisb is True, an error is raised.

    In the same manner as for AEMISSIONS, it is simple to specify the use of pre-computed BEMISSIONS.nc files, making use of the plugin for CHIMERE’s fluxes:

    Show/Hide Code

    datavect:
      plugin:
        name: standard
        version: std
      components:
       meteo:
         dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
         file: METEO.%Y%m%d%H.X.nc
         plugin:
           name: CHIMERE
           version: std
           type: meteo
         file_freq: XH
       flux:
         dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
         file: AEMISSIONS.%Y%m%d%H.X.nc
         plugin:
           name: CHIMERE
           version: AEMISSIONS
           type: flux
         file_freq: XH
       bioflux:
         dir: directory_containing_BEMISSIONS.YYYYMMDDHH.*.nc_files
         file: BEMISSIONS.%Y%m%d%H.X.nc
         plugin:
          name: CHIMERE
          version: AEMISSIONS
          type: flux
         emis_type: bio
         file_freq: XH
    

    Note that the emis_type must here be explicit so that BEMISSIONS files are fetched (and not AEMISSIONS files). If combining various files for different parameters, their names must match BIOGENIC.

  4. The inicond, latcond and topcond components are characterized by their comp_type; otherwise, their specifications follow the same principles as flux.

    In the same manner as for AEMISSIONS, it is simple to specify the use of pre-computed INI_CONCS.nc files and BOUN_CONCS.nc files, using the plugin for CHIMERE’s fields:

    Show/Hide Code

    datavect:
      plugin:
        name: standard
        version: std
      components:
        meteo:
          dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
          file: METEO.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: std
            type: meteo
          file_freq: XH
        flux:
          dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
          file: AEMISSIONS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: AEMISSIONS
            type: flux
          file_freq: XH
        inicond:
          dir: directory_containing_IC.YYYYMMDDHH.*.nc_files
          file: INI_CONCS.0.nc  XX mandatory name???XXXX
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          comp_type: inicond
        latcond:
          dir: directory_containing_BC.YYYYMMDDHH.*.nc_files
          file: BOUN_CONCS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          file_freq: XH
          comp_type: latcond
        topcond:
          dir: directory_containing_BC.YYYYMMDDHH.*.nc_files
          file: BOUN_CONCS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          file_freq: XH
          comp_type: topcond
    

    In this case, all species listed in ACTIVE_SPECIES are fetched in the same input files. To combine various pre-computed files, see the tutorial for more elaborated inputs.

3.10.2. Component for observations

There are three options to compute CHIMERE outputs.

  1. Force the computation of the observation operator without observations: XXX under-constructionwith force_full_operator?XXX

2. Generate random observations: this is done in the yaml by specifying information in the yaml to generate random surface measurements of a set of parameters with plugin measurements. For example, for one measured species only:

Show/Hide Code

concs:
  parameters:
    S1:
      plugin:
        name: random
        type: measurements
        version: param
      frequency: '1H'
      nstations: 5
      duration: '1H'
      random_subperiod_shift: True
      zmax: 100
      seed: True
  1. Make your own observation file

The component named concs is used for surface data, other types are also available, as described in the standard data vector. Here, an example is given for surface observations with the matching yaml and a python code to generate a monitor.nc file with one observation.

Show/Hide Code

concs:
  parameters:
    S1:
      dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK
      file: dummy_monitor.nc

The parameters’s names must be in the ACTIVE_SPECIES file.

To simply run a forward simulation, the dummy monitor file can be filled-in with one observation dated at the final hour of the period. This can be done based on the following short python script:

Show/Hide Code

import pandas as pd
import datetime

# Put here the elements of datef
yearf = 2011
monthf = 2
dayf = 3
hourf = 0
minutef = 0
secondf = 0

# Put here the coordinates of any point in the domain (see repgrid in domain, file HCOORD)
lat0 = 1.2
lon0 = 48.3

list_basic_cols = [ 'date', 'duration', 'station' , 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror', 'alt' ]

datef = datetime.datetime( year = yearf, month = monthf , day = dayf , hour = hourf, minute = minutef, second = secondf)

data = pd.DataFrame( columns = list_basic_cols )

data['date'] = [ datef ]

data = data.assign(duration=1.)
data = data.assign(station='dummy')
data = data.assign(network='none')
data = data.assign(parameter='NONE')
data = data.assign(lon = lon0)
data = data.assign(lat = lat0)
data = data.assign(obs = 500. )
data = data.assign(obserror = 5.)
data = data.assign(alt =1.)

data = data.to_xarray()
data.to_netcdf('monitor_obs_for_simu_ID.nc')

Warning

when no observation is available, no error is raised, the CIF indicates that the forward mode has been successfully executed - even though CHIMERE did not actually run.