####################################################################################
How to add a new type of flux data to be processed by the CIF into a model's inputs
####################################################################################

.. role:: bash(code)
   :language: bash

Pre-requisites
================

Before starting to implement a new flux plugin, you must have:

    - a yaml file ready with a simulation that works with known plugins.
    - a folder where the data you need to implement is stored
    - basic information about the data you need to implement (licensing, format, etc.)

We help you below to navigate through different documentation pages to implement your plugin.
The main reference pages are :doc:`the datastream documentation page </documentation/plugins/datastreams/index>`
and :doc:`the flux template documentation page</documentation/plugins/datastreams/fluxes/flux_plugin_template>`.

Switch from working fluxes to the reference template
=====================================================

The :bash:`datavect` paragraph of your working yaml should look like that:

.. container:: toggle

  .. container:: header

     Example with CHIMERE

  .. code-block:: yaml
    :linenos:

    datavect:
      plugin:
        name: standard
        version: std
      components:
        flux:
          parameters:
            CO2:
              plugin:
                name: CHIMERE
                type: flux
                version: AEMISSIONS
              file_freq: 120H
              dir: some_dir
              file: some_file


Do the following to make it work with the template flux:

  1. follow the initial steps in :doc:`the flux template documentation page</documentation/plugins/datastreams/fluxes/flux_plugin_template>`
     to initialize your new plugin and register it.

     It includes copying the template folder to a new path and changing the variables
     :bash:`_name`, :bash:`_fullname` and :bash:`_version` in the file :bash:`__init__.py`

  2. update your Yaml to use the template flux (renamed with your preference). It should now look like that:

     .. container:: toggle

       .. container:: header

         Show/Hide Code

       .. code-block:: yaml
         :linenos:

           datavect:
             plugin:
               name: standard
               version: std
             components:
               flux:
                 parameters:
                   CO2:
                     plugin:
                       name: your_new_name
                       type: flux
                       version: your_version

  3. Test running again your test case. It should generate fluxes with random values as in the template

Document your plugin
====================

Before going further, be sure to document your plugin properly.

To do so, please replace the docstring header in the file :bash:`__init__.py`.

Include the following information:

    - licensing information
    - permanent link to download the data (or a contact person if no link is publicly available)
    - data format (temporal and horizontal resolution, names and shape of the data files)
    - any specific treatment that prevents the plugin from working with another type of files.

Build and check the documentation
=================================

Before going further, please compile the documentation and check that your new plugin
appears in the list of datastreams plugins :doc:`here</documentation/plugins/datastreams/index>`.

Also check that the documentation of your new plugin is satisfactory.

To compile the documentation, use the command:

.. code-block:: bash

  cd $CIF_root/docs
  make html

Further details can be found :doc:`here</contrib_doc>`.


Updating functions and data to implement your flux data
=======================================================

Your new plugin will need functions to be coded to work.

fetch
------

The :bash:`fetch` function determines what files and corresponding dates are available
for running the present case.
The structure of the :bash:`fetch` function is shown here: :ref:`datastreams-fetch-funtions`.
Please read carefully all explanations therein before starting implementing your case.

By default, the :bash:`fetch` function will use the arguments :bash:`dir` and :bash:`file` in your yaml.
Make sure to update your yaml accordingly:

.. container:: toggle

  .. container:: header

    Show/Hide Code

  .. code-block:: yaml
    :linenos:

      datavect:
        plugin:
          name: standard
          version: std
        components:
          flux:
            parameters:
              CO2:
                plugin:
                  name: your_new_name
                  type: flux
                  version: your_version
                dir: path_to_data
                file: file_name

Depending on how you implement your data stream, extra parameters may be needed.
Please document them on-the-fly in the :bash:`input_arguments` variable in :bash:`__init__.py`.

One classical parameter is :bash:`file_freq`, which gives the frequency of the input files
(independently to the simulation to be computed).

Once implemented, re-run your test case.
You can check that everything went as expected by checking:

    1. in the folder :bash:`$workdir/datavect/flux/your_species/`, links to original data files should be initialized
    2. it is possible to check that the list of dates and files is initialized as expected. To do so, use the option
       :bash:`dump_debug` in the :bash:`datavect` paragraph in the yaml
       (see details :doc:`here</documentation/plugins/datavects/standard>`).
       It will dump the list of dates and files in a file named :bash:`$workdir/datavect/flux.your_species.txt`

get_domain (optional)
---------------------

A datastream plugin needs to be described by a domain to be processed in pyCIF.
There are three valid approaches to associate a valid domain to your flux data.
The two first one are given for information, but the third one is
the one to be preferred in most cases:

    1. fetch it from another object in the set-up. This is relevant when the domain
       should be exactly the same as the one of another Plugin in your configuration.
       For instance, if you are implementing a flux plugin dedicated to a model,
       you will expect it to have exactly the same domain as the model.

       To ensure that your flux plugin fetch the domain from the present set-up,
       it is possible to define a so-called :doc:`requirement </documentation/dependencies>`.
       This is done be adding the following lines to the :bash:`__init__.py` file

       .. code-block:: python

           requirements = {
               "domain": {"name": "CHIMERE", "version": "std", "empty": False},
               }

       In that case, the flux will expect a CHIMERE domain to be defined, otherwise pycif
       will return an exception

    2. directly define the domain in the yaml as a sub-paragraph.
       This will look like that:

       .. container:: toggle

         .. container:: header

           Show/Hide Code

         .. code-block:: yaml
           :linenos:

             datavect:
               plugin:
                 name: standard
                 version: std
               components:
                 flux:
                   parameters:
                     CO2:
                       plugin:
                         name: your_new_name
                         type: flux
                         version: your_version
                       dir: path_to_data
                       file: file_name
                       domain:
                         plugin:
                           name: my_domain_name
                           version: my_domain_version
                         some_extra_parameters: grub

       Such an approach is not necessarily recommended as it forces the user to properly
       configure his/her Yaml file to make the case working properly.

       .. warning::

          If this path is chosen please document the usage very carefully.

    3. Using the function :bash:`get_domain` to define the domain dynamically, based
       on input files, or with fixed parameters.

       The structure of the :bash:`get_domain` function is shown here: :ref:`datastreams-get_domain-funtions`.
       Please read carefully all explanations therein before starting implementing your case.


Once implemented, re-run your test case.
The implementation of the correct domain will have an impact on the native resolution
used to randomly generate fluxes (remember that the :bash:`read` function still
comes from the template and thus generate random fluxes for the corresponding domain).
Therefore, pycif will automatically reproject the fluxes from the implemented domain to
your model's domain.

One can check that the implemented domain is correct by:

    1. check that the flux files generated for your model seem to follow the native resolution of
       your data
    2. it is possible to dump intermediate data during the computation of pycif.
       To do so, activate the option :bash:`save_debug` in the :bash:`obsoperator`:

       .. container:: toggle

         .. container:: header

           Show/Hide Code

         .. code-block:: yaml
           :linenos:

             obsoperator:
               plugin:
                 name: standard
                 version: std
               save_debug: True


       When activated this option dumps intermediate states in :bash:`$workdir/obsoperator/$run_id/transform_debug/`.
       One has to find the ID of the :bash:`regrid` transform reprojecting the native fluxes to your model's domain.
       This information can be found in :bash:`$workdir/obsoperator/transform_description.txt`.

       Once the transform ID retrieved, go to the folder :bash:`$workdir/obsoperator/$run_id/transform_debug/$transform_ID`.
       The directory tree below that folder can be complex, go to the deepest level.
       You should find two netCDF files, one for the inputs, one for the outputs.
       In the outputs, you should find the native resolution, in the output, the projected one.

read
-----

The :bash:`read` function simply reads data for a list of dates and files as deduced from the
:bash:`read` function.
The expected structure for the :bash:`fetch` function is shown here: :ref:`datastreams-read-funtions`.

This function is rather straighforward to implement.
Be sure to have the following structure in outputs:

.. code-block:: python

    output_data.shape = (ndate, nlevel, nlat, nlon)
    output_dates = start_date_of_each_interval

    return xr.DataArray(
        output_data,
        coords={"time": output_dates},
        dims=("time", "lev", "lat", "lon"),
    )

Similarly to the :bash:`get_domain` function, it is possible to check that
the :bash:`read` function is properly implemented by using the option :bash:`save_debug`
and checking that the input fluxes are correct.

.. warning::

    It is likely that the fluxes in your native data stream don't have the same unit
    as the one expected by your model.
    To convert the unit properly, add the :bash:`unit_conversion` paragraph to your Yaml file:

    .. container:: toggle

      .. container:: header

        Show/Hide Code

      .. code-block:: yaml
        :linenos:

          datavect:
            plugin:
              name: standard
              version: std
            components:
              flux:
                parameters:
                  CO2:
                    plugin:
                      name: your_new_name
                      type: flux
                      version: your_version
                    dir: path_to_data
                    file: file_name
                    unit_conversion:
                      scale: ${scaling factor to apply}

write (optional)
-----------------

This function is optional and is necessary only when called by other plugins.
One probably does not need to bother about it at the moment...