Control vectors#

In pyCIF, the control vector \(\mathbf{x}\) is stored in two main different formats:

  • a pickle binary file directly usable by pyCIF

  • a NetCDF for the user

pickle binary file#

pyCIF saves intermediate control vectors at different steps of its execution:

The variables saved in the pickle are:

  • \(\mathbf{x}\): the current values of the control vector

  • \(\mathbf{\delta x}\): the current values of the sensitivities to the control vector (adjoint outputs)

  • \(\mathbf{x}^\textrm{b}\): the values of the prior control vector

  • \(\mathbf{\sigma}^\textrm{b}\): the diagonal terms of the matrix \(\mathbf{B}\)

  • \(\mathbf{\sigma}^\textrm{a}\): the diagonal terms of the matrix \(\mathbf{P}^\textrm{a}\); this term is only available when dumping the posterior control vector in inversions computing posterior uncertainties

All these variables are dumped as one dimensional vectors and each term correspond to one element of the control vector (hence not necessarily at the pixel resolution).

Warning

It is quite complicated to directly use the pickle format as variables are flattened and no information is kept to re-use it outside a pyCIF simulation. It is then recommended to only use the netCDF format for post-processing.

NetCDF file#

It is possible to ask pyCIF to dump the control vector in a more user-friendly format as a NetCDF projected to physical horizontal, vertical and temporal resolutions.

The option to use is save_out_netcdf in the controlvect paragraph of the Yaml configuration file.

When dumping as a NetCDF, pyCIF will create a tree directory following the structure of your datavect. You will find one directory per component. Therein, there will be one netCDF file per species. Below is a simple example:

controlvect
├── fluxes
│   ├── controlvect_fluxes_CH4.nc
│   ├── controlvect_fluxes_CO2.nc
│   └── controlvect_fluxes_N2O.nc
├── biofluxes
│   ├── controlvect_biofluxes_CO2.nc
│   └── controlvect_biofluxes_N2O.nc
└── inicond
    ├── controlvect_inicond_CO2.nc
    └── controlvect_inicond_CH4.nc

Only optimized variables (i.e., using the key-word hresol in the yaml will be dumped in this case).

In each netCDF file, the structure will be as follows:

netcdf controlvect_fluxes_CH4 {
dimensions:
    time = 2 ;
    lev = 1 ;
    lat = 80 ;
    lon = 100 ;
    time_phys = 26 ;
    latc = 81 ;
    lonc = 101 ;
variables:
    int64 time(time) ;
        time:units = "days since 2019-01-01 00:00:00" ;
        time:calendar = "proleptic_gregorian" ;
    int64 lev(lev) ;
    double x(time, lev, lat, lon) ;
        x:_FillValue = NaN ;
    double xb(time, lev, lat, lon) ;
        xb:_FillValue = NaN ;
    double b_std(time, lev, lat, lon) ;
        b_std:_FillValue = NaN ;
    int64 time_phys(time_phys) ;
        time_phys:units = "hours since 2019-01-01 00:00:00" ;
        time_phys:calendar = "proleptic_gregorian" ;
    double x_phys(time_phys, lev, lat, lon) ;
        x_phys:_FillValue = NaN ;
    double xb_phys(time_phys, lev, lat, lon) ;
        xb_phys:_FillValue = NaN ;
    double b_phys(time_phys, lev, lat, lon) ;
        b_phys:_FillValue = NaN ;
    double latitudes(lat, lon) ;
        latitudes:_FillValue = NaN ;
    double latitudes_corner(latc, lonc) ;
        latitudes_corner:_FillValue = NaN ;
    double longitudes(lat, lon) ;
        longitudes:_FillValue = NaN ;
    double longitudes_corner(latc, lonc) ;
        longitudes_corner:_FillValue = NaN ;

// global attributes:
        :_NCProperties = "version=1|netcdflibversion=4.6.1|hdf5libversion=1.10.4" ;
}

The variables are the same as in the pickle file. Variables without the suffix _phys are directly those stored in pyCIF, i.e., they can either contain scaling factor or physical values (type keyword in the yml). The suffix _phys means that the corresponding variables are the physical space of corresponding inputs: they will be identical as variables without the suffix _phys in the case of physical variables (with the type keyword); otherwise, scalar variables are multiplied by corresponding values in the input files.

The horizontal resolution of all variables is the horizontal domain of the underlying data. The temporal resolution is that of the corresponding element of the control vector for variables without the suffix _phys (i.e. as determined by the keywords tresol and tsubresol in the yml). For variables with the suffix _phys, the temporal resolution is a merge between the control vector resolution and the native input resolution.

Dumping and loading control vectors#

Dump#

The control vector is dumped by the function :

pycif.plugins.controlvects.standard.dump.dump(self, cntrl_file, to_netcdf=False, dir_netcdf=None, ensemble=False, **kwargs)[source]#

Dumps a control vector into a pickle file. Does not save large correlations.

Args:
self (pycif.utils.classes.controlvects.ControlVect):

the Control Vector to dump

cntrl_file (str): path to the file to dump as pickle to_netcdf (bool): save to netcdf files if True dir_netcdf (str): root path for the netcdf directory

Load#

The control vector is loaded by the function :

pycif.plugins.controlvects.standard.dump.load(self, cntrl_file, component2load=None, tracer2load=None, target_tracer=None, ensemble=False, **kwargs)[source]#