.. role:: bash(code)
   :language: bash


##############################
Transforms (:bash:`transform`)
##############################

.. contents:: Contents
    :local:

Available Transforms (:bash:`transform`)
=========================================

The following sub-types and :bash:`transforms` are implemented in pyCIF so far:

.. toctree::

  basic/index
  complex/index
  system/index

.. role:: raw-math(raw)
    :format: latex html

Description
===========

The :bash:`transform` class executes elementary operations in pyCIF.
Basically, they form the execution chain of the observation operator.

The observation operator can be decomposed as follows in sub-operations:

.. math::
    
    \mathcal{H}(\mathbf{x}) = ( \mathcal{H}_1 \circ \mathcal{H}_2 \circ \cdots \circ \mathcal{H}_N ) (\mathbf{x})


Each operation is given by a :bash:`transform`.

They have a standardized input and output format to be fully inter-changeable.
The input and output format is a datastore dictionary with the following structure:

Input and output formats
==========================

The transform have a standardized format for their inputs and outputs.
The format is a regular python dictionary :bash:`dict`, whose keys are
the component/parameter IDs of each input/output.
For each key, the input/output is split according to the dates of each sub-simulation
corresponding to the transform.

This read as follows:

.. code-block:: python

    inout_datastore = {
        "inputs": {
            (component0, param0): {
                ddi0: data0,
                ddi1: data1,
                [...]
            },
            [...]
            (componentN, paramN): {
                ddi0: data0,
                ddi1: data1,
                [...]
            }
        },
        "outputs": {
            (component0, param0): {
                ddi0: data0,
                ddi1: data1,
                [...]
            },
            [...]
            (componentN, paramN): {
                ddi0: data0,
                ddi1: data1,
                [...]
            }
        },
    }

.. note::

    Please note that the input components/parameters are not necessarily the same
    as for the outputs.

From here, there is two possibilities for the format of the :bash:`data`:

1. for gridded (from 1- to 4-D) data, the :bash:`data` is an
   `xarray.Dataset <https://xarray.pydata.org/en/stable/generated/xarray.Dataset.html>`__.
   The structure of the Dataset is:
   
   .. code-block:: text
   
        <xarray.Dataset>
        Dimensions:  (time: ntime, lev: nlev, lat: nlat, lon: nlon)
        Coordinates:
          * time     (time) datetime64[ns] list_of_dates
          * lev      (lev) int64 list_of_levels
        Dimensions without coordinates: lat, lon
        Data variables:
            incr     (time, lev, lat, lon) float64 incr_values
            spec     (time, lev, lat, lon) float64 spec_values

   The `xarray.DataArray
   <https://xarray.pydata.org/en/stable/generated/xarray.DataArray.html>`__ :bash:`spec`
   contains the values of the corresponding parameter.
   :bash:`incr` includes the corresponding increments in the case of a tangent-linear
   simulation
   
   
2. for sparsed data (e.g., observations), the :bash:`data` is a
   `pandas.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__.
   The structure is the same as the one described
   :doc:`here </documentation/input-outputs/monitor>` for observation inputs.
   The extra columns :bash:`spec`, and optionally :bash:`incr` for tangent-linear
   computations are included to store the local input/output parameter
   
   
Required parameters, dependencies and functions
===============================================

The following attributes, dependencies and functions should be defined for any
:bash:`transform`, as they are called by other plugins.
They can be parameters to define at the set-up step,
functions to implement in the corresponding module,
or dependencies to be attached to the :bash:`transform` class.

Parameters and attributes
+++++++++++++++++++++++++

mapper
-------

Each transform is defined by a so-called mapper.
The mapper is a dictionary including all the metadata about the inputs, outputs,
and, if applicable, sub-simulations.

It is defined in the function :bash:`ini_mapper` (see bellow) called at the
initialization of the observation operator

Metadata about inputs/outputs are given for every component/parameter involved
in the transform as input/output.
All pieces of information are optional and depends on what is needed to compute
the transform itself.

They read as follows:

.. code-block:: python

    mapper = {
        "inputs": {
            (component1, tracer1): [...],
            (component1, tracer2): [...],
            (component2, tracer3): [...],
            [...]
        },
        "outputs": {
            (component1, tracer1): [...],
            (component1, tracer2): [...],
            (component2, tracer3): [...],
            [...]

        }
    }

.. note::

    The inputs and outputs do not necessarily have the same number of
    components/tracers

The pieces of information to specify in each component/tracer of the
inputs/outputs are:

:input_files: dictionary of input files as defined in the ``fetch`` functions of
    the class ``datastreams`` (see :doc:`here <../datastreams/index>`)
:input_dates: same as above for input dates
:force_loadin: forces inputs prior to the transform to be loaded;
    this means that data needs to be handled by the transform itself;
    should be put to ``True`` in general;
:force_dump: forces to dump inputs prior to the transform
:force_loadout: forces to load outputs posterior to the transform

    .. note::

        the two arguments ``force_loadout`` and ``force_dump`` are used
        when the transform needs to read and returns data as external files.

        This is typically the case for chemistry-transport models that
        cannot directly use the data in the memory, but rather use files

:domain: the domain on which the data is given
:continuous_hdomain: the domain is continuous in the horizontal direction;
    this means that the horizontal interpolation to fit to observations is done
    internally to the transform
:continuous_vdomain: same as above in the vertical direction

    .. note::

        The two options ``continuous_hdomain`` and ``continuous_vdomain``
        are used when interpolations to fit observations are done internally to
        the model/transform.

        For instance, Lagrangian particle dispersion models naturally use
        these options as footprints are computed beforehand on given locations

        On the opposite, for Eulerian models, it is recommended to switch off
        any interpolation function and let pyCIF do it itself, thus putting
        ``continuous_hdomain`` and ``continuous_vdomain`` to False


:is_lbc: the data is used at the sides of the domain; used when regridding
    prior to the transform
:is_top: same as above when the data is used at the top of the domain; used
    when vertically interpolating prior to the transform
:sparse_data: the sparse data format (see above) is used/returned
:sampled: the output data is a sample of the full 4D output on the full domain;
    typically this is used when a domain returns a list of concentrations
    at given grid cells / time stamps; a correspondance is thus needed to
    fit back simulated concentrations to the observation realm
:tracer: the tracer plugin associated to the input/output
:component: the component plugin associated to the input/output

    .. note::

        ``component`` and ``tracer`` are to be used if attributes from the
        component/tracer are needed to compute the transform

:unit: the unit of the corresponding component/tracer; this is used to determine
    whether a unit conversion should be performed; if no unit is defined,
    unitless values are assumed, which may be incompatible with
    expected values elsewhere and possibly return an error.
    Please see how the ``unit_conversion`` transform behaves for further details
    :doc:`here <basic/unit_conversion>`

There is extra information that can be specified in the mapper on top of the
above-mentioned information about inputs and outputs:

:subsimus: all information about sub-simulations and corresponding dates for all
    inputs and outputs.

    This key reads as follows:

    .. code-block:: python

        mapper["subsimus"] = {
            "inputs": {
                (component1, tracer1): {
                    ddi1: [list of dates intervals],
                    ddi2: [list of dates intervals],
                    [...]
                }
            },
            "outputs": {
                (component1, tracer1): {
                    ddi1: [list of dates intervals],
                    ddi2: [list of dates intervals],
                    [...]
                }
            },
        }

:fixed_subsimus: when ``True`` sub-simulations need to be explicitly defined
    in ``subsimus`` and will not be influenced by the rest of the computation
    pipeling


Functions
+++++++++

ini_mapper
----------

The function ``ini_mapper`` is called at the initialization of the transform.
It returns the ``mapper`` as defined above.

Click below for a full example of the :bash:`ini_mapper`
function for the transform ``families`` (details :doc:`here <basic/families>`.

.. module:: pycif.plugins.transforms.basic.families
    :noindex:

.. function:: ini_mapper

forward
-------

The function ``forward`` computes in forward mode the transformation


adjoint
-------

The function ``forward`` computes in backward mode the transformation