Standard CIF data vector standard/std
#
Description#
This is the standard pyCIF implementation of the datavect
class.
Information about inputs are split into component/parameter
categories.
component/parameter
categories are fully flexible in terms of names,
but should be consistent with the rest of the configuration.
General component
categories include for instance:
- concs:
observed concentrations
- fluxes:
emission fluxes
- inicond:
initial conditions
- meteo:
meteorological fields
For each component
, multiple parameters
can be defined depending
on diverse species, sectors, etc.
The datavect
object is used to define the controlvect
and obsvect
objects. Therefore, complementary arguments than those
specific to the datavect
can be used in each component/parameter
.
Please see details of such additional arguments here
and here.
YAML arguments#
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
Optional arguments#
- dump_debug : bool, optional, default False
Save extra information for debugging purpose. It includes the list of files and dates for each input saved in $workdir/datavect/
- components : optional
List of components in the data vector
- Argument structure:
- any_key : optional
Name of a given component
- Argument structure:
- dir : str, optional, default “”
Path to the corresponding component. This value is used if not provided in parameters
- file : str, optional, default “”
File format in the given directory. This value is used if not provided in parameters
- varname : str, optional, default “”
Variable name to use to read data filesinstead of the parameter name if different to the parameter name
- file_freq : str, optional, default “”
Temporal frequency to fetch files
- split_freq : str, optional
Force splitting the processing at a given frequency different to file_freq
- parameters : optional
Store the list of parameters for this component
- Argument structure:
- any_key : optional
Name of a given parameter
- Argument structure:
- dir : str, optional, default “”
Path to the corresponding component. This value is used if not provided in parameters
- file : str, optional, default “”
File format in the given directory. This value is used if not provided in parameters
- varname : str, optional, default “”
Variable name to use to read data filesinstead of the parameter name if different to the parameter name
- file_freq : str, optional, default “”
Temporal frequency to fetch files
- split_freq : str, optional
Force splitting the processing at a given frequency different to file_freq
- hresol : “hpixels” or “regions” or “hbands” or “ibands” or “global”, optional
the horizontal resolution of the control vector.
Warning
This argument determines whether the parameter is included in the control vector. All other arguments will be ignored if this one is not specified.
“hpixels”: use the native resolution of the corresponding data
“regions”: aggregate pixels into regions using a mask specified by the user
“hbands”: aggregate pixels by lon/lat bands
“ibands”: aggregate pixels by column/row index bands
“global”: optimize one factor for the whole spatial extent of the data
- vresol : “vpixels” or “kbands” or “column”, optional, default “column”
the vertical resolution of the control vector.
“vpixels”: use the native resolution of the corresponding data
“kbands”: aggregate pixels into vertical bands by level index
“column”: (default) optimize one factor for the whole vertical extent of the data
- tresol : str, optional
the main temporal resolution of the control vector. Should be a pandas syntax string value. If not specified, only one increment for the full inversion window
- tsubresol : None, optional
secondary resolution for the control vector. If
tsubresol
is not a divider oftresol
, the final temporal resolution will keeptresol
as anchors and them split them accordingly totsubresol
and fitting the size of the last sub-period of each period.For instance if
tresol
is1MS
andtsubresol
is10D
, the control vector will have a monthly resolution with 3 sub-periods per month: the two first periods are 10-days long according totsubresol
and the third sub-period fills the remaining days of the months, hence between 8 days (for February) to 11 days for 31-day-long months- type : “scalar” or “physical”, optional, default “scalar”
type of increments
“scalar”: (default) multiplicative increments. The control vector and the uncertainty matrix store unitless scaling factors
“physical”: additive increments. The control vector and the uncertainty matrix store the values in the original prior data set
- xb_scale : float, optional
a scalar to apply to the prior before any computation
- xb_value : float, optional
an offset to apply to the prior before any computation
- err : float, optional
scaling factor to apply to the prior to compute the standard deviation of prior uncertainties.
- err_type : “max” or “avg”, optional, default “avg”
complement to
err
; approach used to compute prior uncertainties from prior values; used only whentype
=physical
:
“max”: Take the maximum prior value of the surrounding grid cells and scale it by
err
.“avg”: (default) Take the average prior value of all the spatial extent of the prior data and scale it by
err
.
- lower_bound : float, optional
lower boundary for the value of this control variable
- upper_bound : float, optional
upper boundary for the value of this control variable.
- glob_err : optional
used only when
type
=physical
. Can be used to specify a total error for the spatial extent of the prior. The standard deviation of each spatial component of the control vector is scale, so that the total error (accounting for the horizontal correlations if any) matches the one specified- Argument structure:
- total : float, mandatory
the area-weighted sum of all prior values is scaled according to this value
- unit_scale : float, optional, default 1
scaling factor to apply to the sum of prior values. Use if the value specified in
total
is not in the same unit as the one in the prior values- surface_unit : bool, optional, default False
set to True if the total value is given per unit of surface
- frequency_unit : bool, optional, default False
set to True if the total value is given per unit of time
- account_correlations : bool, optional, default True
account or not for correlations to compute the total errors, i.e. also summing non-diagonal terms of the covariance matrix
- lowlim_error : optional
lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data
- Argument structure:
- err : float, mandatory
lower threshold for errors
- unit_scale : float, optional, default 1
scaling factor to apply to prior values. Use if the value specified in
err
is not in the same unit as the one in the prior values
- hcorrelations : optional
horizontal correlations. In most cases, the matrix B is not explicitly built. Instead, Kronecker products are used for each temporal slice of the control vector, horizontal correlations are used
- Argument structure:
- sigma : float, optional
the horizontal correlation length in kilometers
- landsea : bool, optional, default False
separate land and sea pixels
- sigma_land : float, optional
the horizontal correlation length for land pixels
- sigma_sea : float, optional
the horizontal correlation length for sea pixels
- filelsm : str, optional
the path to the land-sea mask; it is a NetCDF with a variable
lsm
; ocean pixels are pixels withlsm
< 0.5- dump_hcorr : bool, optional, default False
save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder
$WORKDIR/controlvect/correlations/
; the name of each file is:horcor_{hresol}_{nlon}x{nlat}_cs{sigma_sea}_cl{sigma_land}.bin
; a suffix_lbc
is appended if correlations are computed for a lateral boundary condition component- dircorrel : str, optional
where to look for pre-computed correlations; files are looked for in the folder following the same format as for
dump_hcorr
- evalmin : float, optional, default 0
minimal value for eigen values to filter out
- crop_chi : bool, optional, default False
if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent with
evalmin
) compared to the full control vector
- tcorrelations : optional
lower limit for the standard deviation of prior uncertainties. The threshold is computed using the physical values of the prior data
- Argument structure:
- multi_sigmas : bool, optional, default False
it is possible to convolve multiple temporal correlation lengths and type (see below). if
multi_sigmas
is True, add a sub-paragraphsigmas
, with multiple entries; for each entry (the name has no importance), specify thesigma_t
andtype
; this read as follows:tcorrelations: multi_sigmas: True sigmas: sigma1: type: isotrope sigma_t: "3D" sigma2: type: frequency freq: "1D" sigma_t: "10D" sigma3: type: category scale: "hourofday" sigma_t: "50D"
Note
Please note the if
multi_sigmas
is True, only the correlation values belowsigmas
will be accounted for.- sigmas : optional
temporal correlation lengths and types, to be used with
multi_sigmas
- Argument structure:
- any_key : optional
correlation length and type
- Argument structure:
- sigma_t : float, mandatory
correlation length
- type : str, mandatory
correlation type
- sigma_t : str, optional
temporal correlation length; should be a pandas frequency string
- type : “isotrope” or “frequency” or “category”, optional
the horizontal correlation length for land pixels
“isotrope”: correlations are simply computed following the temporal distance: \(r = \exp((\delta t / \sigma_t) ^ 2)\)
“frequency”: only control vector components separated by a period of exactly the given
frequency
will be correlated, still following the same formula as forisotrope
; for instance iffrequency
=1D
, only components at the same hour of the day will be correlated with each others“category”: the temporal distance to apply the correlation formula is calculated by temporal categories accepted values: [
hourofday
,dayofweek
,:bash:monthofyear] for instance, withhourofday
, a component at12:00
on a given day will be more correlated to a component at13:00
for another day, than with a component at18:00
of the same day
- dump_tcorr : bool, optional, default False
save horizontal correlations (as eigen vectors and values) for later use; they are saved in the folder
$WORKDIR/controlvect/correlations/
; the name of each file is:tempcor_{datei}_{datef}_per{period}_ct{ sigma_t}_{sigma_type}.bin
; a suffix_lbc
is appended if- dircorrel : str, optional
where to look for pre-computed correlations
- evalmin : float, optional, default 0
minimal value for eigen values to filter out
- crop_chi : None, optional, default False
if True, the regularized vector \(\mathbf{\chi}\) has a reduced dimension (consistent with
evalmin
) compared to the full control vector- bands_lat, bands_lon : list, optional
To be used with
hpixels = bands
. A list of longitudes/latitudes defining a chess-board for aggregating the pixels. The values are the side of each band, hence one needN + 1
values forN
bands- bands_i, bands_j : list, optional
To be used with
hpixels = ibands
. same asbands_lat
/bands_lon
but with column/row indexes- regions_infos : optional
To be used with
hpixels = regions
. Information about the file to be read to define regions.The region file format can either follow a default format, which is a NetCDF file with a variable
regions
; the variable should have the same dimension as the domain of the prior data; It is possible to use the format of another data type as recognized by pycif. In that case, aplugin
sub-paragraph should be included inregions_infos
- Argument structure:
- dir : str, mandatory
Path where to find the region-defining file
- file : str, mandatory
name of the file
- plugin : mandatory
plugin used to read the region-defining file
- Argument structure:
- name : str, mandatory
name of the plugin
- version : str, mandatory
version of the plugin
- regions_lsm : bool, optional, default False
To be used with
hpixels = regions
. Use the index of each regions to determine land and ocean regions. Positive indexes are land regions. Negative and null indexes are ocean regions. This information is used to computed horizontal correlations if the correlation length is different for land and ocean.
Requirements#
The current plugin requires the present plugins to run properly:
Requirement name |
Requirement type |
Explicit definition |
Any valid |
Default name |
Default version |
---|---|---|---|---|---|
domain |
True |
True |
None |
None |
|
model |
True |
True |
None |
None |
|
components |
True |
True |
None |
None |
YAML template#
Please find below a template for a YAML configuration:
1datavect:
2 plugin:
3 name: standard
4 version: std
5 type: datavect
6
7 # Optional arguments
8 dump_debug: XXXXX # bool
9 components:
10 any_key:
11 dir: XXXXX # str
12 file: XXXXX # str
13 varname: XXXXX # str
14 file_freq: XXXXX # str
15 split_freq: XXXXX # str
16 parameters:
17 any_key:
18 dir: XXXXX # str
19 file: XXXXX # str
20 varname: XXXXX # str
21 file_freq: XXXXX # str
22 split_freq: XXXXX # str
23 hresol: XXXXX # hpixels|regions|hbands|ibands|global
24 vresol: XXXXX # vpixels|kbands|column
25 tresol: XXXXX # str
26 tsubresol: XXXXX # None
27 type: XXXXX # scalar|physical
28 xb_scale: XXXXX # float
29 xb_value: XXXXX # float
30 err: XXXXX # float
31 err_type: XXXXX # max|avg
32 lower_bound: XXXXX # float
33 upper_bound: XXXXX # float
34 glob_err:
35 total: XXXXX # float
36 unit_scale: XXXXX # float
37 surface_unit: XXXXX # bool
38 frequency_unit: XXXXX # bool
39 account_correlations: XXXXX # bool
40 lowlim_error:
41 err: XXXXX # float
42 unit_scale: XXXXX # float
43 hcorrelations:
44 sigma: XXXXX # float
45 landsea: XXXXX # bool
46 sigma_land: XXXXX # float
47 sigma_sea: XXXXX # float
48 filelsm: XXXXX # str
49 dump_hcorr: XXXXX # bool
50 dircorrel: XXXXX # str
51 evalmin: XXXXX # float
52 crop_chi: XXXXX # bool
53 tcorrelations:
54 multi_sigmas: XXXXX # bool
55 sigmas:
56 any_key:
57 sigma_t: XXXXX # float
58 type: XXXXX # str
59 sigma_t: XXXXX # str
60 type: XXXXX # isotrope|frequency|category
61 dump_tcorr: XXXXX # bool
62 dircorrel: XXXXX # str
63 evalmin: XXXXX # float
64 crop_chi: XXXXX # None
65 bands_lat, bands_lon: XXXXX # list
66 bands_i, bands_j: XXXXX # list
67 regions_infos:
68 dir: XXXXX # str
69 file: XXXXX # str
70 plugin:
71 name: XXXXX # str
72 version: XXXXX # str
73 regions_lsm: XXXXX # bool