standard / std

Description

This plugin takes care of initializing the control vector and to compute all operations relative to the control vector.

The control vector is initialized according to the information specified in the data vector.

For each parameter in the data vector the following primary arguments are recognized by the CIF to define the corresponding part of the control vector.

  • hresol (mandatory): the horizontal resolution of the control vector.

    accepted values:

    • hpixels: use the native resolution of the corresponding data

    • regions: aggregate pixels into regions using a mask specified by the user

    • hbands: aggregate pixels by lon/lat bands

    • ibands: aggregate pixels by column/row index bands

    • global: optimize one factor for the whole spatial extent of the data

    Warning

    This argument determines whether the parameter is included in the control vector. All other arguments will be ignored if this one is not specified.

  • vresol (optional): the vertical resolution of the control vector.

    accepted values:

    • vpixels: use the native resolution of the corresponding data

    • kbands: aggregate pixels into vertical bands by level index

    • column (default): optimize one factor for the whole vertical extent of the data

  • tresol (optional): the main temporal resolution of the control vector. Should be a pandas syntax string value. If not specified, only one increment for the full inversion window

  • tsubresol (optional): secondary resolution for the control vector. If tsubresol is not a divider of tresol, the final temporal resolution will keep tresol as anchors and them split them accordingly to tsubresol and fitting the size of the last sub-period of each period.

    For instance if tresol is 1MS and tsubresol is 10D, the control vector will have a monthly resolution with 3 sub-periods per month: the two first periods are 10-days long according to tsubresol and the third sub-period fills the remaining days of the months, hence between 8 days (for February) to 11 days for 31-day-long months

  • type (optional): type of increments:

    accepted values:

    • scalar (default): multiplicative increments. The control vector and the uncertainty matrix store unitless scaling factors

    • physical: additive increments. The control vector and the uncertainty matrix store the values in the original prior data set

  • xb_scale: a scalar to apply to the prior before any computation

  • xb_value: an offset to apply to the prior before any computation

  • err: scaling factor to apply to the prior to compute the standard deviation of prior uncertainties.

  • err_type (optional): complement to err; approach used to compute prior uncertainties from prior values; used only when type = physical:

    accepted values:

    • max: Take the maximum prior value of the surrounding grid cells and scale it by err.

    • avg (default): Take the average prior value of all the spatial extent of the prior data and scale it by err.

  • glob_err (optional): used only when type = physical. Can be used to specify a total error for the spatial extent of the prior. The standard deviation of each spatial component of the control vector is scale, so that the total error (accounting for the horizontal correlations if any) matches the one specified

    structure:

    • total (mandatory): the area-weighted sum of all prior values is scaled according to this value.

    • unit_scale (optional, default is 1): scaling factor to apply to the sum of prior values. Use if the value specified in total is not in the same unit as the one in the prior values

  • lowlim_error (optional): lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

    structure:

    • err (mandatory): lower threshold for errors

    • unit_scale (optional, default is 1): scaling factor to apply to prior values. Use if the value specified in err is not in the same unit as the one in the prior values

  • hcorrelations (optional): horizontal correlations. In most cases, the matrix B is not explicitly built. Instead, Kronecker products are used for each temporal slice of the control vector, horizontal correlations are used

    structure:

    • sigma: the horizontal correlation length

    • landsea (optional, default is False): separate land and sea pixels

    • sigma_land: the horizontal correlation length for land pixels

    • sigma_sea: the horizontal correlation length for sea pixels

    • filelsm: the path to the land-sea mask; it is a NetCDF with a variable lsm; ocean pixels are pixels with lsm < 0.5

    • dump_hcorr (optional, default is False): save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: horcor_{hresol}_{nlon}x{nlat}_cs{ sigma_sea}_cl{sigma_land}.bin; a suffix _lbc is appended if correlations are computed for a lateral boundary condition component

    • dircorrel (optional): where to look for pre-computed correlations; files are looked for in the folder following the same format as for dump_hcorr

    • evalmin (optional, default is 0): minimal value for eigen values to filter out

    • crop_chi (optional, default is False): if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent with evalmin) compared to the full control vector

  • tcorrelations (optional): lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

    structure:

    • multi_sigmas (default is False): it is possible to convolve multiple temporal correlation lengths and type (see below). if multi_sigmas is True, add a sub-paragraph sigmas, with multiple entries; for each entry (the name has no importance), specify the sigma_t and type; this read as follows:

      tcorrelations:
        multi_sigmas: True
        sigmas:
          sigma1:
            type: isotrope
            sigma_t: "3D"
          sigma2:
            type: frequency
            freq: "1D"
            sigma_t: "10D"
          sigma3:
            type: category
            scale: "hourofday"
            sigma_t: "50D"
      

      Note

      Please note the if multi_sigmas is True, only the correlation values below sigmas will be accounted for.

    • sigma_t (mandatory): temporal correlation length; should be a pandas frequency string

    • type (mandatory): type of temporal correlation

      accepted values:

      • isotrope: correlations are simply computed following the temporal distance: \(r = \exp((\delta t / \sigma_t) ^ 2)\)

      • frequency: only control vector components separated by a period of exactly the given frequency will be correlated, still following the same formula as for isotrope;

        for instance if frequency = 1D, only components at the same hour of the day will be correlated with each others

      • category: the temporal distance to apply the correlation formula is calculated by temporal categories

        accepted values: [hourofday, dayofweek, monthofyear]

        for instance, with hourofday, a component at 12:00 on a given day will be more correlated to a component at 13:00 for another day, than with a component at 18:00 of the same day

    • dump_tcorr (optional, default is False): save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: tempcor_{datei}_{datef}_per{period}_ct{ sigma_t}_{sigma_type}.bin; a suffix _lbc is appended if

    • dircorrel (optional): where to look for pre-computed correlations

    • evalmin (optional, default is 0): minimal value for eigen values to filter out

    • crop_chi (optional, default is False): if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent with evalmin) compared to the full control vector

Depending on the choice of primary arguments, secondary arguments may be specified. The argument between brackets corresponds to the primary arguments triggering the use of the corresponding secondary argument:

  • bands_lat / bands_lon (hpixel = bands): a list of longitudes/latitudes defining a chess-board for aggregating the pixels. The values are the side of each band, hence one need N + 1 values for N bands

  • bands_i / bands_j (hpixel = ibands): same as bands_lat / bands_lon but with column/row indexes

  • regions_infos (hpixel = regions): Information about the file to be read to define regions.

    The region file format can either follow a default format, which is a NetCDF file with a variable regions; the variable should have the same dimension as the domain of the prior data; It is possible to use the format of another data type as recognized by pycif. In that case, a plugin sub-paragraph should be included in regions_infos

    structure:

    • dir: Path where to find the region-defining file

    • file: name of the file

    • plugin:

      • name: name of the plugin

      • version: version of the plugin

  • regions_lsm (hpixel = regions): Use the index of each regions to determine land and ocean regions. Positive indexes are land regions. Negative and null indexes are ocean regions. This information is used to computed horizontal correlations if the correlation length is different for land and ocean.

Yaml arguments

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Optional arguments

save_out_netcdf: (optional): False

Save NetCDF format in addition to pickle when saving the control vector

accepted type: <class ‘bool’>

reduced_chi: (optional): False

The Chi space can be reduced by clipping the eigen vectors. Beware that it is an approximation that may save some memory and accelerate converge of variational inversions, but miss some correlation structures

accepted type: <class ‘bool’>

save_full_B: (optional): False

Force dumping the full B matrix.

Warning

Be ware of the size of your problem. The full B matrix may be to big to be explicitly defined and stored

accepted type: <class ‘bool’>

Requirements

The current plugin requires the present plugins to run properly:

Requirement name

Requirement type

Explicit definition

Any valid

Default name

Default version

domain

Domain

False

True

None

None

model

Model

False

True

None

None

datavect

DataVect

True

True

standard

std

Yaml template

Please find below a template for a Yaml configuration:

 1controlvect:
 2  plugin:
 3    name: standard
 4    version: std
 5    type: controlvect
 6
 7
 8  # Optional arguments
 9  save_out_netcdf: XXXXX
10  reduced_chi: XXXXX
11  save_full_B: XXXXX