Generating emission files from raw data

Here we use raw data of various types to generate emission files in the right format for CHIMERE.

Good practices

If your are a new user of the CIF or if you use a raw type of data for the fist time, it is safer to follow the steps:

  1. Prepare a yaml for only generating the emissions (as shown below).

  2. Run the system with this yaml.

  3. Check the generated inputs, as explained in Checking the input files:.

Principles

To internally pre-process various types of files, the CIF reads files of known formats and applies a chosen set of transformations (such as interpolation in space) to obtain input netcdf files for CHIMERE.

The supported raw inputs are the ones supported by the various available plugins for the emissions, which are found in datastreams of type Flux (see also cheat-sheet).

It is possible to specify different input sources for the various emitted species which do not require a sub-hourly interpolation. These species must be listed as parameters of flux and their names must match the names in ANTHROPIC. For each parameter, the specifications of the component plugin are inherited by default but it is possible to individualize everything, including the plugin to use, as shown in the following examples.

The information to be provided are:

  1. along the name, type and version of the plugin chosen to deal with the raw files, the information relative to the chosen plugin, which is particular to each plugin and found in the plugin’s documentation.

  2. the up to four pieces of information relative to the component as expected by datavect (see CHIMERE usual inputs and datavact). Note that it is very probable that varname is used since the data of raw outside files have a very small probability of using the same species names as CHIMERE.

  3. recipes to build the transformations from the raw data to CHIMERE’s inputs, i.e. the spatial and temporal interpolations from the initial data to the domain’s grid and the unit conversion. Each recipe is actually a key-word and arguments which are, in the core of the CIF, linked to plugins - this is why the links in the documentation point to plugins, which are not actually accessed by the user in the yaml.

    1. the recipe for building the transformation regrid for the spatial interpolation. If no regrid is specified, the CIF will do a bilinear regridding by default.

    2. if required, the recipe for building the transformation vertical_interpolation for the vertical interpolation. Same as above, if not specified, the default is linear

    3. the recipe for building the transformation time_interpolation for the temporal interpolation. no option yet. under construction

    4. the recipe for building the transformation unit_conversion for converting units

Remarks on the vertical interpolation:

  • if no method is specified, the default interpolation method is linear.

  • if the number of levels to use for the emissions, specified in the model as nlevemis is 1, but there are several levels in the original raw emissions; one expects to project all emissions to the single CHIMERE level. In that case, the linear interpolation will not return the expected behaviour. One should choose the method closest.

Examples

Only one species taken from a raw inventory on one level

For a chemical scheme with only one hourly-interpolated species taken from a raw inventory if nlevemis = 1:

datavect:
  plugin:
    name: standard
    version: std
  components:
    meteo:
      dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
      file: METEO.%Y%m%d%H.X.nc
      plugin:
        name: CHIMERE
        version: std
        type: meteo
      file_freq: XH
    flux:
      dir: /home/comdata1/flux/EDGARV5/TOTAL/
      file: v50_N2O_%Y.0.1x0.1.nc
      varname: emi_n2o
      plugin:
        name: EDGAR
        type: flux
        version:  v5
      closest_year: True
      file_freq: 1Y
      regrid:
        method: mass-conservation
      time_interpolation:
        method: linear
      unit_conversion: # edgar = kg/m2/s-1 -> molec/cm2/s
        scale: 1.368e+21

Note how the input arguments of the EDGAR,v5 plugin are specified (optional argument for choosing (or not) the closest available year).

Only one species taken from a raw inventory on several levels

For a chemical scheme with only one hourly-interpolated species taken from a raw inventory if nlevemis = nlev:

datavect:
  plugin:
    name: standard
    version: std
  components:
   meteo:
     dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
     file: METEO.%Y%m%d%H.X.nc
     plugin:
       name: CHIMERE
       version: std
       type: meteo
     file_freq: XH
   flux:
     dir: /home/comdata1/Fluxes/EDGARV5/TOTAL/
     file: v50_N2O_%Y.0.1x0.1.nc
     varname: emi_n2o
     plugin:
       name: EDGAR
       version:  v5
       type: flux
     file_freq: 1Y
     regrid:
       method: mass-conservation
     time_interpolation:
       method: linear
     unit_conversion: # edgar = kg/m2/s-1 -> molec/cm2/s
       scale: 1.368e+21
     vertical_interpolation:
       method: closest
       fill_nans: False
       fill_nans_value: 0

WARNING if several species are emitted and the flux are specified as above, all the emitted species will be taken from the same varname and therefore, will have the same emissions.

Various species from a raw inventory

To take various emitted species from a raw inventory, as many parameters as emitted species listed in ANTHROPIC must be specified, each one with its particularities:

datavect:
  plugin:
    name: standard
    version: std
  components:
    meteo:
      dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
      file: METEO.%Y%m%d%H.X.nc
      plugin:
        name: CHIMERE
        version: std
        type: meteo
      file_freq: XH
    flux:
      dir: directory_containing_raw_v50.nc_EDGAR_files
      file: v50_%Y.0.1x0.1.nc
      plugin:
        name: EDGAR
        version: v5
        type: flux
      file_freq: 1Y
      regrid:
        method: mass-conservation
      time_interpolation:
        method: linear
      unit_conversion:
        scale: 1e+6
      parameters:
        S1:
          varname: emi_S1
          plugin:
            name: EDGAR
            version: v5
            type: flux
        S2:
          varname: emi_S2
          plugin:
            name: EDGAR
            version: v5
            type: flux
          unit_conversion:
            scale: 1e+2

Note how the varname information expected by datavect as well as the unit_conversion used by the flux plugin can be specified, and subsequently inherited i.e. used by various plugins, for each parameter.XXX STILL TRUE?XX

WARNING since only S1 and S2 are described, an exception will be raised if other ANTHROPIC species exist. To avoid this, it can be interesting to use a general set of files for all species but a few, detailed as parameters. This generally happens when running tests consisting in modifying one (or a very few) species compared to reference AEMISSIONS. This is illustrated in the examples below.

One species (among several) taken from a raw inventory on one level

For only one species of the whole chemical scheme taken from a raw inventory if nlevemis = 1:

datavect:
  plugin:
    name: standard
    version: std
  components:
   meteo:
     dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
     file: METEO.%Y%m%d%H.X.nc
     plugin:
       name: CHIMERE
       version: std
       type: meteo
     file_freq: XH
   flux:
     dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files_on_1_level
     file: AEMISSIONS.%Y%m%d%H.X.nc
     plugin:
       name: CHIMERE
       version: AEMISSIONS
       type: flux
     file_freq: XH
     parameters:
       S1:
         dir: /directory_containing_raw_EDGARV5_files/
         file: v50_N2O_%Y.0.1x0.1.nc
         varname: emi_n2o
         plugin:
           name: EDGAR
           version:  v5
           type: flux
         file_freq: 1Y
         regrid:
           method: mass-conservation
         time_interpolation:
           method: linear
         unit_conversion: # edgar = kg/m2/s-1 -> molec/cm2/s
           scale: 1.368e+21

One species (among several) taken from a raw inventory on several levels

For only one species of the whole chemical scheme taken from a raw inventory if nlevemis = nlev:

datavect:
  plugin:
    name: standard
    version: std
  components:
   meteo:
     dir: directory_containing_METEO.YYYYMMDDHH.*.nc_files
     file: METEO.%Y%m%d%H.X.nc
     plugin:
       name: CHIMERE
       version: std
       type: meteo
     file_freq: XH
   flux:
     dir: directory_containing_AEMISSIONS.YYYYMMDDHH.*.nc_files_on_nlev_levels
     file: AEMISSIONS.%Y%m%d%H.X.nc
     plugin:
       name: CHIMERE
       version: AEMISSIONS
       type: flux
     file_freq: XH
     parameters:
       S1:
         dir: /directory_containing_raw_EDGARV5_files/
         file: v50_N2O_%Y.0.1x0.1.nc
         varname: emi_n2o
         plugin:
           name: EDGAR
           version:  v5
         file_freq: 1Y
         regrid:
           method: mass-conservation
         time_interpolation:
           method: linear
         unit_conversion: # edgar = kg/m2/s-1 -> molec/cm2/s
           scale: 1.368e+21
         vertical_interpolation:
           method: closest
           fill_nans: False
           fill_nans_value: 0

Note that an error will be raised if the number of levels in the pre-processed AEMISSIONS files is not consistent.

Various plugins can be used for various (sets) of species following the same principle.

More advanced users may add plugins as required for new inputs: see details here

The bioflux specifications follow the same principles as flux. All the examples provided for AEMISSIONS apply to BEMISSIONS, with the addition of emis_type.