standard
/ std
¶
Description¶
This plugin takes care of initializing the control vector and to compute all operations relative to the control vector.
The control vector is initialized according to the information specified in the data vector.
For each parameter in the data vector the following primary arguments are recognized by the CIF to define the corresponding part of the control vector.
hresol
(mandatory): the horizontal resolution of the control vector.accepted values:
hpixels
: use the native resolution of the corresponding data
regions
: aggregate pixels into regions using a mask specified by the user
hbands
: aggregate pixels by lon/lat bands
ibands
: aggregate pixels by column/row index bands
global
: optimize one factor for the whole spatial extent of the dataWarning
This argument determines whether the parameter is included in the control vector. All other arguments will be ignored if this one is not specified.
vresol
(optional): the vertical resolution of the control vector.accepted values:
vpixels
: use the native resolution of the corresponding data
kbands
: aggregate pixels into vertical bands by level index
column
(default): optimize one factor for the whole vertical extent of the data
tresol
(optional): the main temporal resolution of the control vector. Should be a pandas syntax string value. If not specified, only one increment for the full inversion window
tsubresol
(optional): secondary resolution for the control vector. Iftsubresol
is not a divider oftresol
, the final temporal resolution will keeptresol
as anchors and them split them accordingly totsubresol
and fitting the size of the last subperiod of each period.For instance if
tresol
is1MS
andtsubresol
is10D
, the control vector will have a monthly resolution with 3 subperiods per month: the two first periods are 10days long according totsubresol
and the third subperiod fills the remaining days of the months, hence between 8 days (for February) to 11 days for 31daylong months
type
(optional): type of increments:accepted values:
scalar
(default): multiplicative increments. The control vector and the uncertainty matrix store unitless scaling factors
physical
: additive increments. The control vector and the uncertainty matrix store the values in the original prior data set
xb_scale
: a scalar to apply to the prior before any computation
xb_value
: an offset to apply to the prior before any computation
err
: scaling factor to apply to the prior to compute the standard deviation of prior uncertainties.
err_type
(optional): complement toerr
; approach used to compute prior uncertainties from prior values; used only whentype
=physical
:accepted values:
max
: Take the maximum prior value of the surrounding grid cells and scale it byerr
.
avg
(default): Take the average prior value of all the spatial extent of the prior data and scale it byerr
.
glob_err
(optional): used only whentype
=physical
. Can be used to specify a total error for the spatial extent of the prior. The standard deviation of each spatial component of the control vector is scale, so that the total error (accounting for the horizontal correlations if any) matches the one specifiedstructure:
total
(mandatory): the areaweighted sum of all prior values is scaled according to this value.
unit_scale
(optional, default is 1): scaling factor to apply to the sum of prior values. Use if the value specified intotal
is not in the same unit as the one in the prior values
lowlim_error
(optional): lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior datastructure:
err
(mandatory): lower threshold for errors
unit_scale
(optional, default is 1): scaling factor to apply to prior values. Use if the value specified inerr
is not in the same unit as the one in the prior values
hcorrelations
(optional): horizontal correlations. In most cases, the matrix B is not explicitly built. Instead, Kronecker products are used for each temporal slice of the control vector, horizontal correlations are usedstructure:
sigma
: the horizontal correlation length
landsea
(optional, default is False): separate land and sea pixels
sigma_land
: the horizontal correlation length for land pixels
sigma_sea
: the horizontal correlation length for sea pixels
filelsm
: the path to the landsea mask; it is a NetCDF with a variablelsm
; ocean pixels are pixels withlsm
< 0.5
dump_hcorr
(optional, default is False): save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder$WORKDIR/controlvect/correlations/
; the name of each file is:horcor_{hresol}_{nlon}x{nlat}_cs{ sigma_sea}_cl{sigma_land}.bin
; a suffix_lbc
is appended if correlations are computed for a lateral boundary condition component
dircorrel
(optional): where to look for precomputed correlations; files are looked for in the folder following the same format as fordump_hcorr
evalmin
(optional, default is 0): minimal value for eigen values to filter out
crop_chi
(optional, default is False): if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent withevalmin
) compared to the full control vector
tcorrelations
(optional): lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior datastructure:
multi_sigmas
(default is False): it is possible to convolve multiple temporal correlation lengths and type (see below). ifmulti_sigmas
is True, add a subparagraphsigmas
, with multiple entries; for each entry (the name has no importance), specify thesigma_t
andtype
; this read as follows:tcorrelations: multi_sigmas: True sigmas: sigma1: type: isotrope sigma_t: "3D" sigma2: type: frequency freq: "1D" sigma_t: "10D" sigma3: type: category scale: "hourofday" sigma_t: "50D"Note
Please note the if
multi_sigmas
is True, only the correlation values belowsigmas
will be accounted for.
sigma_t
(mandatory): temporal correlation length; should be a pandas frequency string
type
(mandatory): type of temporal correlationaccepted values:
isotrope
: correlations are simply computed following the temporal distance: \(r = \exp((\delta t / \sigma_t) ^ 2)\)
frequency
: only control vector components separated by a period of exactly the givenfrequency
will be correlated, still following the same formula as forisotrope
;for instance if
frequency
=1D
, only components at the same hour of the day will be correlated with each others
category
: the temporal distance to apply the correlation formula is calculated by temporal categoriesaccepted values: [
hourofday
,dayofweek
,monthofyear
]for instance, with
hourofday
, a component at12:00
on a given day will be more correlated to a component at13:00
for another day, than with a component at18:00
of the same day
dump_tcorr
(optional, default is False): save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder$WORKDIR/controlvect/correlations/
; the name of each file is:tempcor_{datei}_{datef}_per{period}_ct{ sigma_t}_{sigma_type}.bin
; a suffix_lbc
is appended if
dircorrel
(optional): where to look for precomputed correlations
evalmin
(optional, default is 0): minimal value for eigen values to filter out
crop_chi
(optional, default is False): if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent withevalmin
) compared to the full control vector
Depending on the choice of primary arguments, secondary arguments may be specified. The argument between brackets corresponds to the primary arguments triggering the use of the corresponding secondary argument:
bands_lat
/bands_lon
(hpixel
=bands
): a list of longitudes/latitudes defining a chessboard for aggregating the pixels. The values are the side of each band, hence one needN + 1
values forN
bands
bands_i
/bands_j
(hpixel
=ibands
): same asbands_lat
/bands_lon
but with column/row indexes
regions_infos
(hpixel
=regions
): Information about the file to be read to define regions.The region file format can either follow a default format, which is a NetCDF file with a variable
regions
; the variable should have the same dimension as the domain of the prior data; It is possible to use the format of another data type as recognized by pycif. In that case, aplugin
subparagraph should be included inregions_infos
structure:
dir
: Path where to find the regiondefining file
file
: name of the file
plugin
:
name
: name of the plugin
version
: version of the plugin
regions_lsm
(hpixel
=regions
): Use the index of each regions to determine land and ocean regions. Positive indexes are land regions. Negative and null indexes are ocean regions. This information is used to computed horizontal correlations if the correlation length is different for land and ocean.
Yaml arguments¶
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
Optional arguments¶
save_out_netcdf: (optional): False
Save NetCDF format in addition to pickle when saving the control vector
accepted type: <class ‘bool’>
reduced_chi: (optional): False
The Chi space can be reduced by clipping the eigen vectors. Beware that it is an approximation that may save some memory and accelerate converge of variational inversions, but miss some correlation structures
accepted type: <class ‘bool’>
save_full_B: (optional): False
Force dumping the full B matrix.
Warning
Be ware of the size of your problem. The full B matrix may be to big to be explicitly defined and stored
accepted type: <class ‘bool’>
Requirements¶
The current plugin requires the present plugins to run properly:
Requirement name 
Requirement type 
Explicit definition 
Any valid 
Default name 
Default version 

domain 
False 
True 
None 
None 

model 
False 
True 
None 
None 

datavect 
True 
True 
standard 
std 
Yaml template¶
Please find below a template for a Yaml configuration:
1controlvect:
2 plugin:
3 name: standard
4 version: std
5 type: controlvect
6
7
8 # Optional arguments
9 save_out_netcdf: XXXXX
10 reduced_chi: XXXXX
11 save_full_B: XXXXX