Transforms (transform
)¶
Contents
Available Transforms (transform
)¶
The following sub-types and transforms
are implemented in pyCIF so far:
- Basic
- Available Basic
- Adding background to simulations (
background
/std
) - Clip & crop (
clipcrop
/std
) - Element-wise product (
product
/std
) - Horizontal re-gridding (
regrid
/std
) - Temporal interpolation and re-indexing (
time_interpolation
/std
) - Tracer families or addition (
families
/std
) - Unit conversion (
unit_conversion
/std
) - Vertical interpolation and re-mapping (
vertical_interpolation
/std
) - exponential (
exp
/std
)
- Adding background to simulations (
- Available Basic
- Complex
- System
- Available System
- Computing a transport model (
run_model
/std
) - Data initialization from the control vector (
fromcontrol
/std
) - Dump to model inputs (
dump2inputs
/std
) - Dump to specific format (
dump2format
/std
) - Process data to the observation vector (
toobsvect
/std
) - Read model output to data (
loadfromoutputs
/std
) sample2sparse
/std
sparse2sample
/std
- Computing a transport model (
- Available System
Description¶
The transform
class executes elementary operations in pyCIF.
Basically, they form the execution chain of the observation operator.
The observation operator can be decomposed as follows in sub-operations:
Each operation is given by a transform
.
They have a standardized input and output format to be fully inter-changeable. The input and output format is a datastore dictionary with the following structure:
Input and output formats¶
The transform have a standardized format for their inputs and outputs.
The format is a regular python dictionary dict
, whose keys are
the component/parameter IDs of each input/output.
For each key, the input/output is split according to the dates of each sub-simulation
corresponding to the transform.
This read as follows:
inout_datastore = {
"inputs": {
(component0, param0): {
ddi0: data0,
ddi1: data1,
[...]
},
[...]
(componentN, paramN): {
ddi0: data0,
ddi1: data1,
[...]
}
},
"outputs": {
(component0, param0): {
ddi0: data0,
ddi1: data1,
[...]
},
[...]
(componentN, paramN): {
ddi0: data0,
ddi1: data1,
[...]
}
},
}
Note
Please note that the input components/parameters are not necessarily the same as for the outputs.
From here, there is two possibilities for the format of the data
:
for gridded (from 1- to 4-D) data, the
data
is an xarray.Dataset. The structure of the Dataset is:<xarray.Dataset> Dimensions: (time: ntime, lev: nlev, lat: nlat, lon: nlon) Coordinates: * time (time) datetime64[ns] list_of_dates * lev (lev) int64 list_of_levels Dimensions without coordinates: lat, lon Data variables: incr (time, lev, lat, lon) float64 incr_values spec (time, lev, lat, lon) float64 spec_values
The xarray.DataArray
spec
contains the values of the corresponding parameter.incr
includes the corresponding increments in the case of a tangent-linear simulationfor sparsed data (e.g., observations), the
data
is a pandas.DataFrame. The structure is the same as the one described here for observation inputs. The extra columnsspec
, and optionallyincr
for tangent-linear computations are included to store the local input/output parameter
Required parameters, dependencies and functions¶
The following attributes, dependencies and functions should be defined for any
transform
, as they are called by other plugins.
They can be parameters to define at the set-up step,
functions to implement in the corresponding module,
or dependencies to be attached to the transform
class.
Parameters and attributes¶
mapper¶
Each transform is defined by a so-called mapper. The mapper is a dictionary including all the metadata about the inputs, outputs, and, if applicable, sub-simulations.
It is defined in the function ini_mapper
(see bellow) called at the
initialization of the observation operator
Metadata about inputs/outputs are given for every component/parameter involved in the transform as input/output. All pieces of information are optional and depends on what is needed to compute the transform itself.
They read as follows:
mapper = {
"inputs": {
(component1, tracer1): [...],
(component1, tracer2): [...],
(component2, tracer3): [...],
[...]
},
"outputs": {
(component1, tracer1): [...],
(component1, tracer2): [...],
(component2, tracer3): [...],
[...]
}
}
Note
The inputs and outputs do not necessarily have the same number of components/tracers
The pieces of information to specify in each component/tracer of the inputs/outputs are:
- input_files
dictionary of input files as defined in the
fetch
functions of the classdatastreams
(see here)- input_dates
same as above for input dates
- force_loadin
forces inputs prior to the transform to be loaded; this means that data needs to be handled by the transform itself; should be put to
True
in general;- force_dump
forces to dump inputs prior to the transform
- force_loadout
forces to load outputs posterior to the transform
Note
the two arguments
force_loadout
andforce_dump
are used when the transform needs to read and returns data as external files.This is typically the case for chemistry-transport models that cannot directly use the data in the memory, but rather use files
- domain
the domain on which the data is given
- continuous_hdomain
the domain is continuous in the horizontal direction; this means that the horizontal interpolation to fit to observations is done internally to the transform
- continuous_vdomain
same as above in the vertical direction
Note
The two options
continuous_hdomain
andcontinuous_vdomain
are used when interpolations to fit observations are done internally to the model/transform.For instance, Lagrangian particle dispersion models naturally use these options as footprints are computed beforehand on given locations
On the opposite, for Eulerian models, it is recommended to switch off any interpolation function and let pyCIF do it itself, thus putting
continuous_hdomain
andcontinuous_vdomain
to False- is_lbc
the data is used at the sides of the domain; used when regridding prior to the transform
- is_top
same as above when the data is used at the top of the domain; used when vertically interpolating prior to the transform
- sparse_data
the sparse data format (see above) is used/returned
- sampled
the output data is a sample of the full 4D output on the full domain; typically this is used when a domain returns a list of concentrations at given grid cells / time stamps; a correspondance is thus needed to fit back simulated concentrations to the observation realm
- tracer
the tracer plugin associated to the input/output
- component
the component plugin associated to the input/output
Note
component
andtracer
are to be used if attributes from the component/tracer are needed to compute the transform- unit
the unit of the corresponding component/tracer; this is used to determine whether a unit conversion should be performed; if no unit is defined, unitless values are assumed, which may be incompatible with expected values elsewhere and possibly return an error. Please see how the
unit_conversion
transform behaves for further details here
There is extra information that can be specified in the mapper on top of the above-mentioned information about inputs and outputs:
- subsimus
all information about sub-simulations and corresponding dates for all inputs and outputs.
This key reads as follows:
mapper["subsimus"] = { "inputs": { (component1, tracer1): { ddi1: [list of dates intervals], ddi2: [list of dates intervals], [...] } }, "outputs": { (component1, tracer1): { ddi1: [list of dates intervals], ddi2: [list of dates intervals], [...] } }, }
- fixed_subsimus
when
True
sub-simulations need to be explicitly defined insubsimus
and will not be influenced by the rest of the computation pipeling
Functions¶
ini_mapper¶
The function ini_mapper
is called at the initialization of the transform.
It returns the mapper
as defined above.
Click below for a full example of the ini_mapper
function for the transform families
(details here.
forward¶
The function forward
computes in forward mode the transformation
adjoint¶
The function forward
computes in backward mode the transformation