Modifying a model to use with EnSRF

In the following tutorial, one will learn how to modify a model plugin to make it work in batch mode with the EnSRF running mode.

Set-up a working case with independent sampling

The EnSRF methods rely on generating a random ensemble to approximately solve the inversion equations. There are two possible methods to:

  • batch_sampling: run all samples altogether into a single job; to do so, the control vector is perturbed using virtual species that are propagating all the way to the observation equivalents in the observation operator;

    Basically, for a case when CH4 emissions are optimized, the batch_sampling method will generate an ensemble (flux, CH4_1), …, (flux, CH4_N) of perturbed inputs. Then, perturbations will be propagated through all pyCIF operations, including the CTM. The outputs will be (concs, CH4_1), …, (concs, CH4_N). This will be re-composed into separated monitor.nc files for each samples.

  • parallel_sampling: the ensemble is generated by the master job; then slave jobs are run independently for each perturb control vector. And corresponding monitor.nc files are generated and processed by the master job

The parallel_sampling mode do not need any modification to work. One can test it directly with a working inversion test case.

Examples of inversion configuration files can be found here. Therein, one can look for EnSRF cases for the dummy model. General inversion configurations can be found in all sub-directories as adjtltest examples.

Implementing the batch_sampling for a specific model

Defining a correspondence between perturbed outputs and inputs

For a given configuration to be computed in batch mode, pyCIF perturbs backwards transforms along the transform pipe (see details here) from the observation vector backwards to the control vector and more generally to inputs.

For any parameter in the observation vector, e.g., (‘concs’, ‘CH4’), pyCIF will create virtual equivalents to be computed in parallel: (‘concs’, ‘CH4__sample#000’), (‘concs’, ‘CH4__sample#001’), …, (‘concs’, ‘CH4__sample#00N’). The mapper of every transform in the transform pipe is thus perturbed according to the perturbed inputs of the transform successors. To perturb inputs depending on perturbed outputs, there are two possibilities:

  1. inputs and outputs have the same name, thus perturbations are directly propagated

  2. if input names differ from output names, pyCIF needs extra information to know which inputs should propagate the perturbations. This is done by defining the variable outputs2inputs in the transform mapper, defined in the function ini_mapper. This is valid for any transform, and any model.

    The variable outputs2inputs is defined as a dictionary for which each entry is an unperturbed output, e.g., (‘concs’, ‘CH4’), and the values for each entry are the corresponding inputs to be perturbed.

    For instance, CHIMERE runs with the output (‘concs’, ‘CH4’) and the inputs (‘flux’, ‘CH4’), (‘inicond’, ‘CH4’), (‘latcond’, ‘CH4’), (‘topcond’, ‘CH4’), (‘meteo’, ‘’).

    To produce perturbed outputs, the model will need perturbed inputs for (‘flux’, ‘CH4’), (‘inicond’, ‘CH4’), (‘latcond’, ‘CH4’), (‘topcond’, ‘CH4’), but not for (‘meteo’, ‘’). Therefore, the variable outputs2inputs will look like:

    mapper["outputs2inputs"] = {
        ("concs", s): [("flux", s)]
        for s in model.chemistry.acspecies.attributes
    }
    

    Warning

    The variable outputs2inputs can have a complex behaviour in the case of complex chemical schemes, or when models are flexible in terms of needed inputs to transport a given species. Defining this variable thus need thorough though before starting. Do not hesitate to contact the core team to help you

Perturbing the behaviour of the model

To allow for the batch_sampling to work, pyCIF should be able to modify the behaviour of the model. A dedicated function perturb_model should be added to the model plugin to do so.

Usually, the perturb_model function should modify the underlying chemistry plugin to update, e.g., acspecies, emis_species, etc.

One can find an example of such a function here:

pycif.plugins.models.chimere.perturb_model(self, nsamples, transf_mapper)[source]

Creating a restart_inicond component

At the beginning of a simulation, pyCIF typically reads a prescribed file with initial conditions for the transported species (inicond). With the EnSRF mode, the first cycle starts from these prescribed initial conditions. However, the subsequent cycles must restart from the posterior concentrations of the previous cycle. To this end, when starting the next cycles, pyCIF adds a component restart_inicond to the Yaml configuration file controlling the cycle. In addition, it adds a parameter called ensrf_restart_file providing the path to the restart file to use and it also removes any occurence of the inicond component.

This restart_inicond component must be accepted by the model and processed the same way the restart files are processed for forward simulations. Some model routines must be slightly modified to ensure this:

  1. In the function ini_mapper, the restart_inicond component must be added:

    if getattr(model, "ensrf_restart_file", False):
        for spec in model.chemistry.acspecies.attributes:
            mapper["inputs"][('restart_inicond', spec)] = {
                "input_dates": {model.datei: np.array([[model.datei, model.datei]])},
                "domain": domain,
                "force_dump": True,
                "sparse_data": False,
                "break_fwd_onlyinit_pipe": False,
                "break_adj_onlyinit_pipe": False
            }
    
    else:
        for spec in model.chemistry.acspecies.attributes:
            mapper["inputs"][('inicond', spec)] = {
                "input_dates": {model.datei: np.array([[model.datei, model.datei]])},
                "domain": domain,
                "force_dump": True,
                "sparse_data": False,
                "break_fwd_onlyinit_pipe": False,
                "break_adj_onlyinit_pipe": False
            }
    

2. In the function native2inputs, the restart_inicond component type must be properly handled. Note that the function to write is dependent on the model. Below, an example for the model ICON-ART is given:

# Initial conditions from a restart file
elif input_type == 'restart_inicond':
    for trid in datastore:
        dirorig = datastore[trid]["dirorig"]
        fileorig = datastore[trid]["fileorig"]

        source_file = os.path.join(dirorig, fileorig)

        target_file = "{}/restart_atm_DOM01.nc".format(runsubdir)
        path.link(source_file, target_file)