Standard CIF data vector `standard/std`#

Description#

This is the standard pyCIF implementation of the datavect class. Information about inputs are split into component/parameter categories. component/parameter categories are fully flexible in terms of names, but should be consistent with the rest of the configuration.

General component categories include for instance:

concs:: observed concentrations
fluxes:: emission fluxes
inicond:: initial conditions
meteo:: meteorological fields

For each component, multiple parameters can be defined depending on diverse species, sectors, etc.

The datavect object is used to define the controlvect and obsvect objects. Therefore, complementary arguments than those specific to the datavect can be used in each component/parameter. Please see details of such additional arguments here and here.

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Optional arguments#

dump_debug : bool, optional, default False: Save extra information for debugging purpose. It includes the list of files and dates for each input saved in $workdir/datavect/

components : optional

List of components in the data vector

Argument structure:

any_key : optional

Name of a given component

Argument structure:

dir : str, optional, default “”: Path to the corresponding component. This value is used if not provided in parameters

file : str, optional, default “”: File format in the given directory. This value is used if not provided in parameters

varname : str, optional, default “”: Variable name to use to read data filesinstead of the parameter name if different to the parameter name

file_freq : str, optional, default “”: Temporal frequency to fetch files

split_freq : str, optional: Force splitting the processing at a given frequency different to file_freq

parameters : optional

Store the list of parameters for this component

Argument structure:

any_key : optional

Name of a given parameter

Argument structure:

dir : str, optional, default “”: Path to the corresponding component. This value is used if not provided in parameters

file : str, optional, default “”: File format in the given directory. This value is used if not provided in parameters

varname : str, optional, default “”: Variable name to use to read data filesinstead of the parameter name if different to the parameter name

file_freq : str, optional, default “”: Temporal frequency to fetch files

split_freq : str, optional: Force splitting the processing at a given frequency different to file_freq

hresol : “hpixels” or “regions” or “bands” or “ibands” or “global”, optional: the horizontal resolution of the control vector.

Warning

This argument determines whether the parameter is included in the control vector. All other arguments will be ignored if this one is not specified.

“hpixels”: use the native resolution of the corresponding data
“regions”: aggregate pixels into regions using a mask specified by the user
“bands”: aggregate pixels by lon/lat bands
“ibands”: aggregate pixels by column/row index bands
“global”: optimize one factor for the whole spatial extent of the data

vresol : “vpixels” or “kbands” or “column”, optional, default “column”: the vertical resolution of the control vector.

“vpixels”: use the native resolution of the corresponding data
“kbands”: aggregate pixels into vertical bands by level index
“column”: (default) optimize one factor for the whole vertical extent of the data

tresol : str, optional: the main temporal resolution of the control vector. Should be a pandas syntax string value. If not specified, only one increment for the full inversion window

tsubresol : None, optional

secondary resolution for the control vector. If tsubresol is not a divider of tresol, the final temporal resolution will keep tresol as anchors and them split them accordingly to tsubresol and fitting the size of the last sub-period of each period.

For instance if tresol is 1MS and tsubresol is 10D, the control vector will have a monthly resolution with 3 sub-periods per month: the two first periods are 10-days long according to tsubresol and the third sub-period fills the remaining days of the months, hence between 8 days (for February) to 11 days for 31-day-long months

type : “scalar” or “physical”, optional, default “scalar”: type of increments

“scalar”: (default) multiplicative increments. The control vector and the uncertainty matrix store unitless scaling factors
“physical”: additive increments. The control vector and the uncertainty matrix store the values in the original prior data set

xb_scale : float, optional: a scalar to apply to the prior before any computation

xb_value : float, optional: an offset to apply to the prior before any computation

err : float, optional: scaling factor to apply to the prior to compute the standard deviation of prior uncertainties.

err_type : “max” or “avg”, optional, default “avg”: complement to err; approach used to compute prior uncertainties from prior values; used only when type = physical:

“max”: Take the maximum prior value of the surrounding grid cells and scale it by err.
“avg”: (default) Take the average prior value of all the spatial extent of the prior data and scale it by err.

lower_bound : float, optional: lower boundary for the value of this control variable

upper_bound : float, optional: upper boundary for the value of this control variable.

glob_err : optional

used only when type = physical. Can be used to specify a total error for the spatial extent of the prior. The standard deviation of each spatial component of the control vector is scale, so that the total error (accounting for the horizontal correlations if any) matches the one specified

Argument structure:

total : float, mandatory: the area-weighted sum of all prior values is scaled according to this value

unit_scale : float, optional, default 1: scaling factor to apply to the sum of prior values. Use if the value specified in total is not in the same unit as the one in the prior values

surface_unit : bool, optional, default False: set to True if the total value is given per unit of surface

frequency_unit : bool, optional, default False: set to True if the total value is given per unit of time

account_correlations : bool, optional, default True: account or not for correlations to compute the total errors, i.e. also summing non-diagonal terms of the covariance matrix

lowlim_error : optional

lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

Argument structure:

err : float, mandatory: lower threshold for errors

unit_scale : float, optional, default 1: scaling factor to apply to prior values. Use if the value specified in err is not in the same unit as the one in the prior values

hcorrelations : optional

horizontal correlations. In most cases, the matrix B is not explicitly built. Instead, Kronecker products are used for each temporal slice of the control vector, horizontal correlations are used

Argument structure:

sigma : float, optional: the horizontal correlation length in kilometers

landsea : bool, optional, default False: separate land and sea pixels

sigma_land : float, optional: the horizontal correlation length for land pixels

sigma_sea : float, optional: the horizontal correlation length for sea pixels

filelsm : str, optional: the path to the land-sea mask; it is a NetCDF with a variable lsm; ocean pixels are pixels with lsm < 0.5

dump_hcorr : bool, optional, default False: save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: horcor_{hresol}_{nlon}x{nlat}_cs{sigma_sea}_cl{sigma_land}.bin; a suffix _lbc is appended if correlations are computed for a lateral boundary condition component

dircorrel : str, optional: where to look for pre-computed correlations; files are looked for in the folder following the same format as for dump_hcorr

evalmin : float, optional, default 0: minimal value for eigen values to filter out

crop_chi : bool, optional, default False: if True, the regularized vector $\mathbf{\chi}$ has a reduced dimension (consistent with evalmin) compared to the full control vector

tcorrelations : optional

lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

Argument structure:

multi_sigmas : bool, optional, default False

it is possible to convolve multiple temporal correlation lengths and type (see below). if multi_sigmas is True, add a sub-paragraph sigmas, with multiple entries; for each entry (the name has no importance), specify the sigma_t and type; this read as follows:

tcorrelations:
  multi_sigmas: True
  sigmas:
    sigma1:
      type: isotrope
      sigma_t: "3D"
    sigma2:
      type: frequency
      freq: "1D"
      sigma_t: "10D"
    sigma3:
      type: category
      scale: "hourofday"
      sigma_t: "50D"

Note

Please note the if multi_sigmas is True, only the correlation values below sigmas will be accounted for.

sigmas : optional

temporal correlation lengths and types, to be used with multi_sigmas

Argument structure:

any_key : optional

correlation length and type

Argument structure:

sigma_t : float, mandatory: correlation length

type : str, mandatory: correlation type

sigma_t : str, optional: temporal correlation length; should be a pandas frequency string

type : “isotrope” or “frequency” or “category”, optional: the horizontal correlation length for land pixels

“isotrope”: correlations are simply computed following the temporal distance: $r = \exp((\delta t / \sigma_t) ^ 2)$
“frequency”: only control vector components separated by a period of exactly the given frequency will be correlated, still following the same formula as for isotrope; for instance if frequency = 1D, only components at the same hour of the day will be correlated with each others
“category”: the temporal distance to apply the correlation formula is calculated by temporal categories accepted values: [hourofday, dayofweek,:bash:monthofyear] for instance, with hourofday, a component at 12:00 on a given day will be more correlated to a component at 13:00 for another day, than with a component at 18:00 of the same day

dump_tcorr : bool, optional, default False: save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: tempcor_{datei}_{datef}_per{period}_ct{ sigma_t}_{sigma_type}.bin; a suffix _lbc is appended if

dircorrel : str, optional: where to look for pre-computed correlations

evalmin : float, optional, default 0: minimal value for eigen values to filter out

crop_chi : None, optional, default False: if True, the regularized vector $\mathbf{\chi}$ has a reduced dimension (consistent with evalmin) compared to the full control vector

dump_physical : None, optional, default True: if True, dumps physical values in the control vector netcdf

bands_lat, bands_lon : list, optional: To be used with hpixels = bands. A list of longitudes/latitudes defining a chess-board for aggregating the pixels. The values are the side of each band, hence one need N + 1 values for N bands

bands_i, bands_j : list, optional: To be used with hpixels = ibands. same as bands_lat / bands_lon but with column/row indexes

regions_infos : optional

To be used with hpixels = regions. Information about the file to be read to define regions.

The region file format can either follow a default format, which is a NetCDF file with a variable regions; the variable should have the same dimension as the domain of the prior data; It is possible to use the format of another data type as recognized by pycif. In that case, a plugin sub-paragraph should be included in regions_infos

Argument structure:

dir : str, mandatory: Path where to find the region-defining file

file : str, mandatory: name of the file

plugin : mandatory

plugin used to read the region-defining file

Argument structure:

name : str, mandatory: name of the plugin

version : str, mandatory: version of the plugin

regions_lsm : bool, optional, default False: To be used with hpixels = regions. Use the index of each regions to determine land and ocean regions. Positive indexes are land regions. Negative and null indexes are ocean regions. This information is used to computed horizontal correlations if the correlation length is different for land and ocean.

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name	Requirement type	Explicit definition	Any valid	Default name	Default version
domain	Domain	True	True	None	None
model	Model	True	True	None	None
components	DataStream	True	True	None	None

YAML template#

Please find below a template for a YAML configuration:

datavect:
  plugin:
    name: standard
    version: std
    type: datavect

  # Optional arguments
  dump_debug: XXXXX  # bool
  components:
    any_key:
      dir: XXXXX  # str
      file: XXXXX  # str
      varname: XXXXX  # str
      file_freq: XXXXX  # str
      split_freq: XXXXX  # str
      parameters:
        any_key:
          dir: XXXXX  # str
          file: XXXXX  # str
          varname: XXXXX  # str
          file_freq: XXXXX  # str
          split_freq: XXXXX  # str
          hresol: XXXXX  # hpixels|regions|bands|ibands|global
          vresol: XXXXX  # vpixels|kbands|column
          tresol: XXXXX  # str
          tsubresol: XXXXX  # None
          type: XXXXX  # scalar|physical
          xb_scale: XXXXX  # float
          xb_value: XXXXX  # float
          err: XXXXX  # float
          err_type: XXXXX  # max|avg
          lower_bound: XXXXX  # float
          upper_bound: XXXXX  # float
          glob_err:
            total: XXXXX  # float
            unit_scale: XXXXX  # float
            surface_unit: XXXXX  # bool
            frequency_unit: XXXXX  # bool
            account_correlations: XXXXX  # bool
          lowlim_error:
            err: XXXXX  # float
            unit_scale: XXXXX  # float
          hcorrelations:
            sigma: XXXXX  # float
            landsea: XXXXX  # bool
            sigma_land: XXXXX  # float
            sigma_sea: XXXXX  # float
            filelsm: XXXXX  # str
            dump_hcorr: XXXXX  # bool
            dircorrel: XXXXX  # str
            evalmin: XXXXX  # float
            crop_chi: XXXXX  # bool
          tcorrelations:
            multi_sigmas: XXXXX  # bool
            sigmas:
              any_key:
                sigma_t: XXXXX  # float
                type: XXXXX  # str
            sigma_t: XXXXX  # str
            type: XXXXX  # isotrope|frequency|category
          dump_tcorr: XXXXX  # bool
          dircorrel: XXXXX  # str
          evalmin: XXXXX  # float
          crop_chi: XXXXX  # None
          dump_physical: XXXXX  # None
          bands_lat, bands_lon: XXXXX  # list
          bands_i, bands_j: XXXXX  # list
          regions_infos:
            dir: XXXXX  # str
            file: XXXXX  # str
            plugin:
              name: XXXXX  # str
              version: XXXXX  # str
          regions_lsm: XXXXX  # bool

Standard CIF data vector standard/std

Contents

Standard CIF data vector `standard/std`#

Description#

YAML arguments#

Optional arguments#

Requirements#

YAML template#

Standard CIF data vector standard/std

Contents

Standard CIF data vector standard/std#

Description#

YAML arguments#

Optional arguments#

Requirements#

YAML template#

Standard CIF data vector `standard/std`#