How to make super-observations#

This tutorial explains how to build super-observations from a dataset for a given CTM configuration.

Super-observations are data built for assimilation by aggregating a larger set of raw measurements. They can be averages, medians, or other combinations of the raw data.

Building super-observations reduces the number of data points to assimilate while preserving relevant information and, potentially, avoiding the assimilation of incompatible or redundant measurements.

The computation of super-observations is not handled directly by the CIF, as it must be carefully designed for each case.

However, following the main steps described here ensures optimal use of the CIF’s capabilities and consistency across studies.

1. Run a forward simulation with the so-called raw data.#

XXX renvoyer a comment faire + points importants = bien ajuster la config du modele = a la fin de cette simu, tout doit etre fixe sauf le jeu d’obs XXX

2. Use the output monitor to select data#

This first simulation generates a monitor file at $WORKDIR/obsoperator/fwd_0000/obsvect/satellites/SPEC/monitor.nc. It contains information computed by the CIF using the CTM configuration: the spatial coordinates of the grid cell matching each data point (i, j, level), its temporal coordinates (tstep, tstep_glo, dtstep), and the simulated equivalent of the observation (sim).

For satellite data, an additional information file is generated to record the specific data associated with each obs: $WORKDIR/obsoperator/fwd_0000/chain/satellites/default_00001/infos_DATEDEB.nc, where DATEDEB is the start date of the simulation.

This complete set of information allows filtering data based on simulation criteria, for example removing data points with too large a discrepancy from the simulation.

XXX example codes for plotting the data+sim?XXXX

An example code to use monitor and info together, apply some filtering and changes and generate a “clean” monitor is provided here:

#########################################################
# This code is provided by: I. Pison
# contact: isabelle.pison@lsce.ipsl.fr
# date: June 2022
# project: H2020 VERIFY, ANR ARGONAUT, CNES Tosca ARGOS
# relation to CIF: uses plugins, not embedded in.
# It is an example, to be used as a basis
# for developing other work.
# No guarantee as to its compatibility or results
# is attached since it is not part of the CIF
# Please don't forget acknowledgements if you use it.
#########################################################

from pycif.utils.datastores import dump
import copy
import numpy as np
import pandas as pd
from pycif.utils.netcdf import readnc
import xarray as xr
from itertools import zip_longest

def retrieve_aks(fin,index,list_var): 

   # fin is the original input monitor 
   # containing satellite-specific information: ak, qa0, etc
   # fin is read to retrieve this info for data number index
   index = int(index)
   qa0, ak, pavg0, dryair = readnc(fin, list_var)

   return qa0[index].tolist(), ak[index].tolist(), pavg0[index].tolist(), dryair[index].tolist()

# WORKDIR of the forward simulation with all "raw" data
refdir = '/home/chimereicos/fwd_all_data/obsoperator/fwd_0000/'
# directory for the info file
dirinfo = 'chain/satellites/default_00001'
# directory for the output monitor file
dirmonit = 'obsvect/satellites/CH4/'
# files to use
monitor_in = '{}{}/monitor.nc'.format(refdir,dirmonit)
infos = '{}{}/infos_201901010000.nc'.format(refdir,dirinfo)

# basic data to provide in an input monitor
list_basic_cols = [ 'date', 'duration', 'station', 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror' ]

# Monitor and info data to use
ds = dump.read_datastore(monitor_in)
dsinf = dump.read_datastore(infos,col2dump= ['orig_index', 'file_aks'])

# paste ds and dsinf together to get full information for each obs
ds3 = pd.concat( [ds,dsinf],axis = 1)

# example filter: filter out obs outside the domain
ds3 = ds3.loc[ ~np.isnan(ds3['orig_index']) ] 

# example change: set obserror at a given value
ds3 = ds3.assign(obserror = 20)

# generate a satellite input monitor with these new data:
# for each line in ds2, retrieve qa0,ak,pavg0,dryair from the right file
list_var = ['qa0', 'ak', 'pavg0', 'dryair']
ds4 = ds3.apply(lambda x: retrieve_aks(x.file_aks,x.orig_index,list_var), axis = 1)
# reformat ds4 to put satellite specific info in a xarray
# keep index to keep track of selection - no impact on the CIF
ds5 = pd.DataFrame((item for item in ds4), columns = list_var, index=ds3.index)
fixed_nlevels = len(ds5['ak'].iloc[0])
list_tab = []
for k,var in enumerate(list_var):
  list_tab.append(pd.DataFrame((item for item in ds5[var])).values)
# satellite specific info
ds_sat = xr.Dataset({'qa0': (['index', 'level'], list_tab[0]),
                     'ak': (['index', 'level'], list_tab[1]),
                     'pavg0': (['index', 'level_pressure'], list_tab[2]),
                     'dryair': (['index', 'level'], list_tab[3])},
                     coords={'index': ds5.index,
                             'level': range(fixed_nlevels),
                              'level_pressure': range(fixed_nlevels + 1)})
# basic data 
basic_data = ds3[list_basic_cols].to_xarray()

# merge basic and satellite-specific data
alldata = xr.merge([ds_sat, basic_data])

# create new clean input monitor
alldata.to_netcdf('new_monitor.nc')


XXX what about le bouchage de trous??XXXX

From this clean monitor, it may be useful to re-run a forward simulation, same as in step 1.

3. Generate super-observations#

From an output monitor, it is straightforward to combine data at given spatial and temporal scales, either in observation space (using lat, lon, date) or in model grid space (using i, j, tstep_glo). Using the matching info file, a super-observation input monitor can then be generated. Using code similar to the example above, super-observations can be generated by grouping the full-information dataframe (i.e. ds3, after or instead of lines 40–44) and retrieving the relevant satellite-specific information in the same way.

XXX put example codes XXX

4. Use the super-observations#

Use the monitor containing the super-observations for a forward simulation and an adjoint test.

If everything looks correct, it can then be used as input for inversions with the same CTM configuration.