How to make super-observations#
This tutorial explains how to build super-observations from a dataset for a given CTM configuration.
Super-observations are data built for assimilation by aggregating a larger set of raw measurements. They can be averages, medians, or other combinations of the raw data.
Building super-observations reduces the number of data points to assimilate while preserving relevant information and, potentially, avoiding the assimilation of incompatible or redundant measurements.
The computation of super-observations is not handled directly by the CIF, as it must be carefully designed for each case.
However, following the main steps described here ensures optimal use of the CIF’s capabilities and consistency across studies.
1. Run a forward simulation with the so-called raw data.#
XXX renvoyer a comment faire + points importants = bien ajuster la config du modele = a la fin de cette simu, tout doit etre fixe sauf le jeu d’obs XXX
2. Use the output monitor to select data#
This first simulation generates a monitor file at $WORKDIR/obsoperator/fwd_0000/obsvect/satellites/SPEC/monitor.nc. It contains information computed by the CIF using the CTM configuration: the spatial coordinates of the grid cell matching each data point (i, j, level), its temporal coordinates (tstep, tstep_glo, dtstep), and the simulated equivalent of the observation (sim).
For satellite data, an additional information file is generated to record the specific data associated with each obs: $WORKDIR/obsoperator/fwd_0000/chain/satellites/default_00001/infos_DATEDEB.nc, where DATEDEB is the start date of the simulation.
This complete set of information allows filtering data based on simulation criteria, for example removing data points with too large a discrepancy from the simulation.
XXX example codes for plotting the data+sim?XXXX
An example code to use monitor and info together, apply some filtering and changes and generate a “clean” monitor is provided here:
######################################################### # This code is provided by: I. Pison # contact: isabelle.pison@lsce.ipsl.fr # date: June 2022 # project: H2020 VERIFY, ANR ARGONAUT, CNES Tosca ARGOS # relation to CIF: uses plugins, not embedded in. # It is an example, to be used as a basis # for developing other work. # No guarantee as to its compatibility or results # is attached since it is not part of the CIF # Please don't forget acknowledgements if you use it. ######################################################### from pycif.utils.datastores import dump import copy import numpy as np import pandas as pd from pycif.utils.netcdf import readnc import xarray as xr from itertools import zip_longest def retrieve_aks(fin,index,list_var): # fin is the original input monitor # containing satellite-specific information: ak, qa0, etc # fin is read to retrieve this info for data number index index = int(index) qa0, ak, pavg0, dryair = readnc(fin, list_var) return qa0[index].tolist(), ak[index].tolist(), pavg0[index].tolist(), dryair[index].tolist() # WORKDIR of the forward simulation with all "raw" data refdir = '/home/chimereicos/fwd_all_data/obsoperator/fwd_0000/' # directory for the info file dirinfo = 'chain/satellites/default_00001' # directory for the output monitor file dirmonit = 'obsvect/satellites/CH4/' # files to use monitor_in = '{}{}/monitor.nc'.format(refdir,dirmonit) infos = '{}{}/infos_201901010000.nc'.format(refdir,dirinfo) # basic data to provide in an input monitor list_basic_cols = [ 'date', 'duration', 'station', 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror' ] # Monitor and info data to use ds = dump.read_datastore(monitor_in) dsinf = dump.read_datastore(infos,col2dump= ['orig_index', 'file_aks']) # paste ds and dsinf together to get full information for each obs ds3 = pd.concat( [ds,dsinf],axis = 1) # example filter: filter out obs outside the domain ds3 = ds3.loc[ ~np.isnan(ds3['orig_index']) ] # example change: set obserror at a given value ds3 = ds3.assign(obserror = 20) # generate a satellite input monitor with these new data: # for each line in ds2, retrieve qa0,ak,pavg0,dryair from the right file list_var = ['qa0', 'ak', 'pavg0', 'dryair'] ds4 = ds3.apply(lambda x: retrieve_aks(x.file_aks,x.orig_index,list_var), axis = 1) # reformat ds4 to put satellite specific info in a xarray # keep index to keep track of selection - no impact on the CIF ds5 = pd.DataFrame((item for item in ds4), columns = list_var, index=ds3.index) fixed_nlevels = len(ds5['ak'].iloc[0]) list_tab = [] for k,var in enumerate(list_var): list_tab.append(pd.DataFrame((item for item in ds5[var])).values) # satellite specific info ds_sat = xr.Dataset({'qa0': (['index', 'level'], list_tab[0]), 'ak': (['index', 'level'], list_tab[1]), 'pavg0': (['index', 'level_pressure'], list_tab[2]), 'dryair': (['index', 'level'], list_tab[3])}, coords={'index': ds5.index, 'level': range(fixed_nlevels), 'level_pressure': range(fixed_nlevels + 1)}) # basic data basic_data = ds3[list_basic_cols].to_xarray() # merge basic and satellite-specific data alldata = xr.merge([ds_sat, basic_data]) # create new clean input monitor alldata.to_netcdf('new_monitor.nc')
XXX what about le bouchage de trous??XXXX
From this clean monitor, it may be useful to re-run a forward simulation, same as in step 1.
3. Generate super-observations#
From an output monitor, it is straightforward to combine data at given spatial and temporal scales,
either in observation space (using lat, lon, date) or in model grid space (using i, j, tstep_glo).
Using the matching info file, a super-observation input monitor can then be generated.
Using code similar to the example above, super-observations can be generated by grouping
the full-information dataframe (i.e. ds3, after or instead of lines 40–44)
and retrieving the relevant satellite-specific information in the same way.
XXX put example codes XXX
4. Use the super-observations#
Use the monitor containing the super-observations for a forward simulation and an adjoint test.
If everything looks correct, it can then be used as input for inversions with the same CTM configuration.