Standard CIF control vector `standard/std`#

Description#

This plugin takes care of initializing the control vector and to compute all operations relative to the control vector.

The control vector is initialized according to the information specified in the data vector.

Control vector related arguments in the data vector#

For each parameter in the data vector the following primary arguments are recognized by the CIF to define the corresponding part of the control vector.

hresol : “hpixels” or “regions” or “bands” or “ibands” or “global”, optional: the horizontal resolution of the control vector.

Warning

This argument determines whether the parameter is included in the control vector. All other arguments will be ignored if this one is not specified.

“hpixels”: use the native resolution of the corresponding data
“regions”: aggregate pixels into regions using a mask specified by the user
“bands”: aggregate pixels by lon/lat bands
“ibands”: aggregate pixels by column/row index bands
“global”: optimize one factor for the whole spatial extent of the data

vresol : “vpixels” or “kbands” or “column”, optional, default “column”: the vertical resolution of the control vector.

“vpixels”: use the native resolution of the corresponding data
“kbands”: aggregate pixels into vertical bands by level index
“column”: (default) optimize one factor for the whole vertical extent of the data

tresol : str, optional: the main temporal resolution of the control vector. Should be a pandas syntax string value. If not specified, only one increment for the full inversion window

tsubresol : None, optional

secondary resolution for the control vector. If tsubresol is not a divider of tresol, the final temporal resolution will keep tresol as anchors and them split them accordingly to tsubresol and fitting the size of the last sub-period of each period.

For instance if tresol is 1MS and tsubresol is 10D, the control vector will have a monthly resolution with 3 sub-periods per month: the two first periods are 10-days long according to tsubresol and the third sub-period fills the remaining days of the months, hence between 8 days (for February) to 11 days for 31-day-long months

type : “scalar” or “physical”, optional, default “scalar”: type of increments

“scalar”: (default) multiplicative increments. The control vector and the uncertainty matrix store unitless scaling factors
“physical”: additive increments. The control vector and the uncertainty matrix store the values in the original prior data set

xb_scale : float, optional: a scalar to apply to the prior before any computation

xb_value : float, optional: an offset to apply to the prior before any computation

err : float, optional: scaling factor to apply to the prior to compute the standard deviation of prior uncertainties.

err_type : “max” or “avg”, optional, default “avg”: complement to err; approach used to compute prior uncertainties from prior values; used only when type = physical:

“max”: Take the maximum prior value of the surrounding grid cells and scale it by err.
“avg”: (default) Take the average prior value of all the spatial extent of the prior data and scale it by err.

lower_bound : float, optional: lower boundary for the value of this control variable

upper_bound : float, optional: upper boundary for the value of this control variable.

glob_err : optional

used only when type = physical. Can be used to specify a total error for the spatial extent of the prior. The standard deviation of each spatial component of the control vector is scale, so that the total error (accounting for the horizontal correlations if any) matches the one specified

Argument structure:

total : float, mandatory: the area-weighted sum of all prior values is scaled according to this value

unit_scale : float, optional, default 1: scaling factor to apply to the sum of prior values. Use if the value specified in total is not in the same unit as the one in the prior values

surface_unit : bool, optional, default False: set to True if the total value is given per unit of surface

frequency_unit : bool, optional, default False: set to True if the total value is given per unit of time

account_correlations : bool, optional, default True: account or not for correlations to compute the total errors, i.e. also summing non-diagonal terms of the covariance matrix

lowlim_error : optional

lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

Argument structure:

err : float, mandatory: lower threshold for errors

unit_scale : float, optional, default 1: scaling factor to apply to prior values. Use if the value specified in err is not in the same unit as the one in the prior values

hcorrelations : optional

horizontal correlations. In most cases, the matrix B is not explicitly built. Instead, Kronecker products are used for each temporal slice of the control vector, horizontal correlations are used

Argument structure:

sigma : float, optional: the horizontal correlation length in kilometers

landsea : bool, optional, default False: separate land and sea pixels

sigma_land : float, optional: the horizontal correlation length for land pixels

sigma_sea : float, optional: the horizontal correlation length for sea pixels

filelsm : str, optional: the path to the land-sea mask; it is a NetCDF with a variable lsm; ocean pixels are pixels with lsm < 0.5

dump_hcorr : bool, optional, default False: save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: horcor_{hresol}_{nlon}x{nlat}_cs{sigma_sea}_cl{sigma_land}.bin; a suffix _lbc is appended if correlations are computed for a lateral boundary condition component

dircorrel : str, optional: where to look for pre-computed correlations; files are looked for in the folder following the same format as for dump_hcorr

evalmin : float, optional, default 0: minimal value for eigen values to filter out

crop_chi : bool, optional, default False: if True, the regularized vector $\mathbf{\chi}$ has a reduced dimension (consistent with evalmin) compared to the full control vector

tcorrelations : optional

lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data

Argument structure:

multi_sigmas : bool, optional, default False

it is possible to convolve multiple temporal correlation lengths and type (see below). if multi_sigmas is True, add a sub-paragraph sigmas, with multiple entries; for each entry (the name has no importance), specify the sigma_t and type; this read as follows:

tcorrelations:
  multi_sigmas: True
  sigmas:
    sigma1:
      type: isotrope
      sigma_t: "3D"
    sigma2:
      type: frequency
      freq: "1D"
      sigma_t: "10D"
    sigma3:
      type: category
      scale: "hourofday"
      sigma_t: "50D"

Note

Please note the if multi_sigmas is True, only the correlation values below sigmas will be accounted for.

sigmas : optional

temporal correlation lengths and types, to be used with multi_sigmas

Argument structure:

any_key : optional

correlation length and type

Argument structure:

sigma_t : float, mandatory: correlation length

type : str, mandatory: correlation type

sigma_t : str, optional: temporal correlation length; should be a pandas frequency string

type : “isotrope” or “frequency” or “category”, optional: the horizontal correlation length for land pixels

“isotrope”: correlations are simply computed following the temporal distance: $r = \exp((\delta t / \sigma_t) ^ 2)$
“frequency”: only control vector components separated by a period of exactly the given frequency will be correlated, still following the same formula as for isotrope; for instance if frequency = 1D, only components at the same hour of the day will be correlated with each others
“category”: the temporal distance to apply the correlation formula is calculated by temporal categories accepted values: [hourofday, dayofweek,:bash:monthofyear] for instance, with hourofday, a component at 12:00 on a given day will be more correlated to a component at 13:00 for another day, than with a component at 18:00 of the same day

dump_tcorr : bool, optional, default False: save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder $WORKDIR/controlvect/correlations/; the name of each file is: tempcor_{datei}_{datef}_per{period}_ct{ sigma_t}_{sigma_type}.bin; a suffix _lbc is appended if

dircorrel : str, optional: where to look for pre-computed correlations

evalmin : float, optional, default 0: minimal value for eigen values to filter out

crop_chi : None, optional, default False: if True, the regularized vector $\mathbf{\chi}$ has a reduced dimension (consistent with evalmin) compared to the full control vector

dump_physical : None, optional, default True: if True, dumps physical values in the control vector netcdf

Depending on the choice of primary arguments, secondary arguments may be specified. The argument between brackets corresponds to the primary arguments triggering the use of the corresponding secondary argument:

bands_lat, bands_lon : list, optional: To be used with hpixels = bands. A list of longitudes/latitudes defining a chess-board for aggregating the pixels. The values are the side of each band, hence one need N + 1 values for N bands

bands_i, bands_j : list, optional: To be used with hpixels = ibands. same as bands_lat / bands_lon but with column/row indexes

regions_infos : optional

To be used with hpixels = regions. Information about the file to be read to define regions.

The region file format can either follow a default format, which is a NetCDF file with a variable regions; the variable should have the same dimension as the domain of the prior data; It is possible to use the format of another data type as recognized by pycif. In that case, a plugin sub-paragraph should be included in regions_infos

Argument structure:

dir : str, mandatory: Path where to find the region-defining file

file : str, mandatory: name of the file

plugin : mandatory

plugin used to read the region-defining file

Argument structure:

name : str, mandatory: name of the plugin

version : str, mandatory: version of the plugin

regions_lsm : bool, optional, default False: To be used with hpixels = regions. Use the index of each regions to determine land and ocean regions. Positive indexes are land regions. Negative and null indexes are ocean regions. This information is used to computed horizontal correlations if the correlation length is different for land and ocean.

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Optional arguments#

save_out_netcdf : bool, optional, default False: Save NetCDF format in addition to pickle when saving the control vector

force_adj_netcdf : bool, optional, default False: Force saving sensitivities to the adjoint as netcdf

reduced_chi : bool, optional, default False: The Chi space can be reduced by clipping the eigen vectors. Beware that it is an approximation that may save some memory and accelerate converge of variational inversions, but miss some correlation structures

save_full_B : bool, optional, default False: Force dumping the full B matrix.

Warning

Beware of the size of your problem. The full B matrix may be to big to be explicitly defined and stored

reload_xb : bool, optional, default False: Load x from a pre-defined file

perturb_xb : bool, optional, default False: Perturb xb using B

reload_file : str, optional, default “”: File from which to reload x

use_boundaries : bool, optional, default False: Define range of validity for the control variables

transform_pipe : optional

List of transformations to build the main observation operator pipeline

Argument structure:

any_key : optional

Name of a given transformation to be included. The name has no impact on the way the observation operator is computed, although it is recommended to use explicit names to help debugging.

Argument structure:

**args : optional: Arguments to set-up the given transform

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name	Requirement type	Explicit definition	Any valid	Default name	Default version
domain	Domain	True	True	None	None
model	Model	True	True	None	None
datavect	DataVect	True	True	standard	std

YAML template#

Please find below a template for a YAML configuration:

controlvect:
  plugin:
    name: standard
    version: std
    type: controlvect

  # Optional arguments
  save_out_netcdf: XXXXX  # bool
  force_adj_netcdf: XXXXX  # bool
  reduced_chi: XXXXX  # bool
  save_full_B: XXXXX  # bool
  reload_xb: XXXXX  # bool
  perturb_xb: XXXXX  # bool
  reload_file: XXXXX  # str
  use_boundaries: XXXXX  # bool
  transform_pipe:
    any_key:
      **args: XXXXX  # any

Standard CIF control vector standard/std

Contents

Standard CIF control vector `standard/std`#

Description#

YAML arguments#

Optional arguments#

Requirements#

YAML template#

Standard CIF control vector standard/std

Contents

Standard CIF control vector standard/std#

Description#

Control vector related arguments in the data vector#

YAML arguments#

Optional arguments#

Requirements#

YAML template#

Standard CIF control vector `standard/std`#