3. Elaborate the YAML for the CIF, using ready-made files#

Important

How to use the cheat-sheet for plugins

The following sections require using plugins and providing their specifications. Arguments for each plugin are documented on its individual documentation page. To make finding plugins easier, the cheat-sheet organizes them by type (the leftmost column, e.g. chemistry, controlvect, fields). For each type, available plugins are listed with their name and version. Specifying a plugin’s name and version is mandatory; specifying its type is not always necessary.

3.1. Section for PyCIF parameters:#

This section must contain the five arguments shown in the example:

  • verbose controls the verbosity level: 1 for basic information, 2 for debugging

  • workdir is the working directory. The CIF will create it and use it to run the simulation and store all inputs and outputs. Choose a location with sufficient disk space.

  • logfile is the name of the log file written by the CIF, saved in workdir.

  • datei and datef are the start and end dates of the simulation period. Accepted formats: YYYY-mm-dd or YYYY-mm-dd HH:mm:ss.

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

verbose: 2
logfile: pycif.logtest
workdir: /tmp/CIF//.tox/py38/tmp/fwd_ref_chimere
datei: 2011-03-22 00:00:00
datef: 2011-03-22 09:00:00

This section of the YAML can also define anchors for use elsewhere in the file.

3.2. Mode (mode)#

Here, a forward simulation is the chosen mode for running the model. Available plugins for the mode class are listed in the cheat-sheet. For the chosen plugin (forward simulation), provide its name and version, and note its requirements. The full description of the mode class lists all available arguments. For forward, no mandatory argument is required, but several optional arguments are available; the template YAML at the end of that page provides a complete list. In the example below, only reload_results is used, to avoid recomputing the whole simulation after an interruption.

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/modes/forward.html

mode:
  plugin:
    name: forward
    version: std

  reload_results: true

The requirements for our forward mode are Observation operator (obsoperator) and Control vector (controlvect). They are to be specified in the next sections of the YAML file.

3.3. Observation operator (obsoperator)#

Our chemistry-transport model maps from flux space to concentration space, which corresponds to the standard choice of obsoperator. For this standard obsoperator, no mandatory argument is required, but several optional arguments are available, as shown in the full template YAML. In our example, autorestart is used.

obsoperator:
  plugin:
    name: standard
    version: std
  autorestart: True

The requirements for the standard obsoperator are controlvect, datavect, model, obsvect, and platform.

3.4. Control vector (controlvect)#

Currently, only the standard plugin is available for controlvect. For this standard controlvect, no mandatory argument is required, though several optional arguments are available. In this example, no optional argument is set, so default values apply.

controlvect:
  plugin:
    name: standard
    version: std

The requirements for the standard controlvect are datavect, domain, model and obsvect.

3.5. Model (model)#

Use the CHIMERE model plugin. The typical configuration choices for a CHIMERE simulation (see CHIMERE documentation) are split between mandatory and optional arguments; default values are provided for optional ones. Check both the mandatory and optional arguments to configure the simulation as intended. The model settings must be consistent with the chemistry (see sections Locate the input files for CHIMERE and Chemistry (chemistry)) and the domain (see sections Locate the input files for CHIMERE and Domain (domain)).

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/models/chimere.html

model:
  plugin:
    name: CHIMERE
    version: std

  auto-recompile: true
  dir_sources: /tmp/CIF//model_sources/chimereGES
  direxec: /tmp/PYCIF_DATA_TEST/CHIMERE/CHIMERE_executables
  ichemstep: 1
  ideepconv: 0
  nivout: 17
  nlevemis: 17
  nmdoms: 1
  nphour_ref: 6
  nzdoms: 1
  periods: 3H
  usechemistry: 1
  usedepos: 1
  usewetdepos: 1

The requirements for CHIMERE are domain, chemistry and a set of components (corresponding to the inputs of CHIMERE itself) to be detailed in datavect: meteo, flux, bioflux, latcond, topcond and inicond.

3.6. Observation Vector (obsvect)#

To avoid unnecessary computations, the CIF only runs the simulation up to the time covered by available observations. The standard obsvect must therefore be initialized. See section Component for observations for how to provide minimal dummy observation data.

#####################
# pyCIF config file #
#####################

# Define here all parameters for pyCIF following YAML syntax
# For details on YAML syntax, please see:
# http://docs.ansible.com/ansible/latest/YAMLSyntax.html

###############################################################################
# pyCIF parameters

###############################################################################

# http://community-inversion.eu/documentation/plugins/obsvects/standard.html

obsvect:
  plugin:
    name: standard
    version: std

  dump: true

Its requirements are datavect and the model.

3.7. Platform (platform)#

Specify the computing platform on which the simulation will run, so that the CIF can choose the right configuration and perform platform-specific operations such as module load for the relevant modules. In this example, the platform is set to LSCE on the obelix cluster.

platform:
  plugin:
    name: LSCE
    version: obelix

The only requirement is the model.

3.8. Domain (domain)#

Specify a domain for CHIMERE (see also the cheat-sheet) consistently with the pre-computed input files (see step 2). The files defining the domain can be stored directly in directory repgrid or symbolic links can be used.

domain :
  plugin:
    name    : CHIMERE
    version : std
  repgrid: a_path_for_CHIMERE_COORD_definition_files/
  domid : MYDOMAIN
  nlev: 20
  p1: 997
  pmax: 200
  pressure_unit: hPa

3.9. Chemistry (chemistry)#

The only available chemical scheme type is photolysis with tabulated Js, with the scheme pre-computed (see step 2).

chemistry :
  plugin:
    name: CHIMERE
    version: gasJtab
  schemeid: name.chemistry
  dir_precomp: the_path_for_the_directory_of_which_chemical_scheme_named_above_is_a_subdir/

3.10. Data vector (datavect)#

The data vector contains the input data for the model (e.g. emission fluxes) and for comparison to observations (e.g. concentration data), used by controlvect, obsoperator, and obsvect to build the run configuration.

Currently, only the standard datavect plugin is available.

datavect :
  plugin:
    name: standard
    version: std

For the first forward simulation, its components are the requirements of the model plugin not already covered in earlier YAML sections (i.e. excluding domain and chemistry), plus a component for the (dummy) observation data.

3.10.1. CHIMERE usual inputs#

The components that provide the model with its inputs must be chosen from the available datastreams recognized by the model plugin, which uses them to pre-process or fetch its inputs. For CHIMERE, the plugin expects flux to provide hourly-averaged emission data (i.e. data for the AEMISSIONS file).

For each datastream component, datavect requires at least three pieces of information:

  1. dir: the directory where the component’s data files are located (mandatory)

  2. file: either a fixed filename or a pattern for a set of files using date placeholders (year, month, day, hour, etc.) (mandatory)

  3. file_freq: the time span covered by each file, in pandas duration format, e.g. 1D, 120H (mandatory, except for initial conditions)

To distinguish boundary condition types (lateral, top, initial), comp_type must also be specified.

These arguments belong to datavect, not to any particular plugin (see also optional general arguments here).

When a plugin is used by a component to process the specified files, its name, version, and type must be specified, along with its own arguments.

The datastream components for model inputs point to plugins that handle the inputs required by CHIMERE: meteo (I) for meteorological inputs, emission fluxes (II and III), boundary conditions (IV), and initial conditions (IV).

  1. With pre-computed METEO.nc files, specifying the meteo component is straightforward:

    1. provide the minimum information: dir and file pointing to ready-made METEO.nc files, and the matching file_freq.

    2. add the plugin for CHIMERE’s ready-made meteo files (see also the cheat-sheet)

    #####################
    # pyCIF config file #
    #####################
    
    # Define here all parameters for pyCIF following YAML syntax
    # For details on YAML syntax, please see:
    # http://docs.ansible.com/ansible/latest/YAMLSyntax.html
    
    ###############################################################################
    # pyCIF parameters
    
    ###############################################################################
    
    datavect:
      components:
        meteo:
          plugin:
            name: CHIMERE
            version: std
          dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK
          file: 'METEO.%Y%m%d%H.3.nc'
          file_freq: 3H
    

    In this case, METEO.nc filenames follow the format used by CHIMERE. A custom template is also possible, e.g. file: METEO_some_etiket.%Y%m%d%H.X.nc.

  2. Pre-computed files for fluxes, biofluxes, inicond, latcond, and topcond are specified in the same way.

    Note

    All pre-computed files must be mutually consistent and consistent with the domain, the pyCIF parameters, and the model settings, in particular the optional argument periods (default: 1D = 24 hours).

    Following the same pattern as for meteorology, pre-computed AEMISSIONS.nc files are specified with dir and file, handled by the plugin for CHIMERE fluxes. Use the cheat-sheet to find the correct type, name, and version:

    datavect:
      plugin:
        name: standard
        version: std
      components:
        meteo:
          dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
          file: METEO.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: std
            type: meteo
          file_freq: XH
        flux:
          dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
          file: AEMISSIONS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: AEMISSIONS
            type: flux
          file_freq: XH
    

    The CIF expects AEMISSIONS.nc files in the exact format used by CHIMERE, containing all species listed in ANTHROPIC.

    It is also possible to combine different ready-made files for species that do not require sub-hourly interpolation. These species must be listed as parameters of flux, with names matching those in ANTHROPIC. Each parameter can be configured independently, as shown in the tutorial for more elaborated inputs.

  3. The bioflux component follows the same principles as flux. If no sub-hourly flux interpolation is needed (useemisb is False in model), this component can be omitted. If it is omitted while useemisb is True, an error is raised.

    Pre-computed BEMISSIONS.nc files are specified in the same way as AEMISSIONS, using the plugin for CHIMERE fluxes:

    datavect:
      plugin:
        name: standard
        version: std
      components:
       meteo:
         dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
         file: METEO.%Y%m%d%H.X.nc
         plugin:
           name: CHIMERE
           version: std
           type: meteo
         file_freq: XH
       flux:
         dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
         file: AEMISSIONS.%Y%m%d%H.X.nc
         plugin:
           name: CHIMERE
           version: AEMISSIONS
           type: flux
         file_freq: XH
       bioflux:
         dir: directory_containing_BEMISSIONS.YYYYMMDDHH.*.nc_files
         file: BEMISSIONS.%Y%m%d%H.X.nc
         plugin:
          name: CHIMERE
          version: AEMISSIONS
          type: flux
         emis_type: bio
         file_freq: XH
    

    Note that emis_type must be specified explicitly so that BEMISSIONS files are fetched instead of AEMISSIONS files. When combining files for different parameters, their names must match BIOGENIC.

  4. The inicond, latcond, and topcond components are distinguished by their comp_type; otherwise their specifications follow the same principles as flux.

    Pre-computed INI_CONCS.nc files and BOUN_CONCS.nc files are specified in the same way as AEMISSIONS, using the plugin for CHIMERE fields:

    datavect:
      plugin:
        name: standard
        version: std
      components:
        meteo:
          dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
          file: METEO.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: std
            type: meteo
          file_freq: XH
        flux:
          dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files
          file: AEMISSIONS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: AEMISSIONS
            type: flux
          file_freq: XH
        inicond:
          dir: directory_containing_IC.YYYYMMDDHH.*.nc_files
          file: INI_CONCS.0.nc  XX mandatory name???XXXX
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          comp_type: inicond
        latcond:
          dir: directory_containing_BC.YYYYMMDDHH.*.nc_files
          file: BOUN_CONCS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          file_freq: XH
          comp_type: latcond
        topcond:
          dir: directory_containing_BC.YYYYMMDDHH.*.nc_files
          file: BOUN_CONCS.%Y%m%d%H.X.nc
          plugin:
            name: CHIMERE
            version: icbc
            type: field
          file_freq: XH
          comp_type: topcond
    

    In this configuration, all species listed in ACTIVE_SPECIES are read from the same input files. To use different files for different species, see the tutorial for more elaborated inputs.

3.10.2. Component for observations#

There are three options for providing observation data to drive CHIMERE.

  1. Force computation of the observation operator without real observations (under construction).

2. Generate random observations: specify random surface measurements for a set of parameters using the measurements plugin in the YAML. For example, for a single measured species:

concs:
  parameters:
    S1:
      plugin:
        name: random
        type: measurements
        version: param
      frequency: '1h'
      nstations: 5
      duration: '1h'
      random_subperiod_shift: True
      zmax: 100
      seed: True
  1. Provide your own observation file.

The concs component is used for surface data; other types are available as described in the standard data vector. Below is an example for surface observations, including the matching YAML snippet and a Python script to generate a minimal monitor.nc file.

concs:
  parameters:
    S1:
      dir: /tmp/PYCIF_DATA_TEST/CHIMERE/ACADOK
      file: dummy_monitor.nc

Parameter names must match entries in the ACTIVE_SPECIES file.

To simply run a forward simulation, the dummy monitor file needs only one observation at the final hour of the simulation period. The following short Python script generates such a file:

import pandas as pd
import datetime

# Put here the elements of datef
yearf = 2011
monthf = 2
dayf = 3
hourf = 0
minutef = 0
secondf = 0

# Put here the coordinates of any point in the domain (see repgrid in domain, file HCOORD)
lat0 = 1.2
lon0 = 48.3

list_basic_cols = [ 'date', 'duration', 'station' , 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror', 'alt' ]

datef = datetime.datetime( year = yearf, month = monthf , day = dayf , hour = hourf, minute = minutef, second = secondf)

data = pd.DataFrame( columns = list_basic_cols )

data['date'] = [ datef ]

data = data.assign(duration=1.)
data = data.assign(station='dummy')
data = data.assign(network='none')
data = data.assign(parameter='NONE')
data = data.assign(lon = lon0)
data = data.assign(lat = lat0)
data = data.assign(obs = 500. )
data = data.assign(obserror = 5.)
data = data.assign(alt =1.)

data = data.to_xarray()
data.to_netcdf('monitor_obs_for_simu_ID.nc')

Warning

when no observation is available, no error is raised, the CIF indicates that the forward mode has been successfully executed - even though CHIMERE did not actually run.