How to make super-observations#
Tutorial on how to make super-observations from a set of data and for a given set-up of a CTM.
Super-observations are data to be assimilated which are built from a set of more numerous data. Super-observations can be averages, medians or other combinations of these “raw” data.
Making super-observations is a way to reduce the number of data to assimilate while keeping a relevant amount of information and, potentially, avoiding the assimilation of incompatible information or of redundant information.
The computation of super-observations is not included as such in the CIF as it requires to be carefully thought out for each case.
Nevertheless, following the main steps described here ensures an optimal use of the CIF’s capacities and a good consistency in the various studies.
1. Run a forward simulation with the so-called raw data.#
XXX renvoyer a comment faire + points importants = bien ajuster la config du modele = a la fin de cette simu, tout doit etre fixe sauf le jeu d’obs XXX
2. Use the output monitor to select data#
From this first simulation, a monitor file is generated: $WORKIDR/obsoperator/fwd_0000/obsvect/satellites/SPEC/monitor.nc
. It contains information computed by the CIF with the configuration of the CTM: the spatial coordinates of the grid cell matching each data (i, j, level), its temporal coodinates (tstep, tstep_glo, dtstep) and the equivalent of the observation (sim).
In case of satellite data, an information file is also generated to keep track of the specific data associated to each obs
: $WORKDIR/obsoperator/fwd_0000/chain/satellites/default_00001/infos_DATEDEB.nc
with DATEDEB the date of the beginning of the simulation XXwhy??XX.
This complete set of information makes it possible to filter out data according to criteria taking into account the simulation e.g. filtering out data with too large a discrepancy with the simulation.
XXX example codes for plotting the data+sim?XXXX
An example code to use monitor and info together, apply some filtering and changes and generate a “clean” monitor is provided here:
######################################################### # This code is provided by: I. Pison # contact: isabelle.pison@lsce.ipsl.fr # date: June 2022 # project: H2020 VERIFY, ANR ARGONAUT, CNES Tosca ARGOS # relation to CIF: uses plugins, not embedded in. # It is an example, to be used as a basis # for developing other work. # No guarantee as to its compatibility or results # is attached since it is not part of the CIF # Please don't forget acknowledgements if you use it. ######################################################### from pycif.utils.datastores import dump import copy import numpy as np import pandas as pd from pycif.utils.netcdf import readnc import xarray as xr from itertools import zip_longest def retrieve_aks(fin,index,list_var): # fin is the original input monitor # containing satellite-specific information: ak, qa0, etc # fin is read to retrieve this info for data number index index = int(index) qa0, ak, pavg0, dryair = readnc(fin, list_var) return qa0[index].tolist(), ak[index].tolist(), pavg0[index].tolist(), dryair[index].tolist() # WORKDIR of the forward simulation with all "raw" data refdir = '/home/chimereicos/fwd_all_data/obsoperator/fwd_0000/' # directory for the info file dirinfo = 'chain/satellites/default_00001' # directory for the output monitor file dirmonit = 'obsvect/satellites/CH4/' # files to use monitor_in = '{}{}/monitor.nc'.format(refdir,dirmonit) infos = '{}{}/infos_201901010000.nc'.format(refdir,dirinfo) # basic data to provide in an input monitor list_basic_cols = [ 'date', 'duration', 'station', 'network', 'parameter', 'lon', 'lat', 'obs', 'obserror' ] # Monitor and info data to use ds = dump.read_datastore(monitor_in) dsinf = dump.read_datastore(infos,col2dump= ['orig_index', 'file_aks']) # paste ds and dsinf together to get full information for each obs ds3 = pd.concat( [ds,dsinf],axis = 1) # example filter: filter out obs outside the domain ds3 = ds3.loc[ ~np.isnan(ds3['orig_index']) ] # example change: set obserror at a given value ds3 = ds3.assign(obserror = 20) # generate a satellite input monitor with these new data: # for each line in ds2, retrieve qa0,ak,pavg0,dryair from the right file list_var = ['qa0', 'ak', 'pavg0', 'dryair'] ds4 = ds3.apply(lambda x: retrieve_aks(x.file_aks,x.orig_index,list_var), axis = 1) # reformat ds4 to put satellite specific info in a xarray # keep index to keep track of selection - no impact on the CIF ds5 = pd.DataFrame((item for item in ds4), columns = list_var, index=ds3.index) fixed_nlevels = len(ds5['ak'].iloc[0]) list_tab = [] for k,var in enumerate(list_var): list_tab.append(pd.DataFrame((item for item in ds5[var])).values) # satellite specific info ds_sat = xr.Dataset({'qa0': (['index', 'level'], list_tab[0]), 'ak': (['index', 'level'], list_tab[1]), 'pavg0': (['index', 'level_pressure'], list_tab[2]), 'dryair': (['index', 'level'], list_tab[3])}, coords={'index': ds5.index, 'level': range(fixed_nlevels), 'level_pressure': range(fixed_nlevels + 1)}) # basic data basic_data = ds3[list_basic_cols].to_xarray() # merge basic and satellite-specific data alldata = xr.merge([ds_sat, basic_data]) # create new clean input monitor alldata.to_netcdf('new_monitor.nc')
XXX what about le bouchage de trous??XXXX
From this clean monitor, it may be useful to re-run a forward simulation, same as in step 1.
3. Generate super-observations#
From an output monitor, it is easy to combine data at given spatial and temporal scales,
either in the data’s world (using lat, lon, date) or in the model’s grid (using i, j, tstep_glo).
Using the matching info file, a super-obs input monitor can then be generated.
Using the same type of code as above, the super-observations can be generated from grouping
on the dataframe containing the full information i.e. ds3
(after or instead of lines 40-44)
then performing the retrieving of the relevant satellite specific information in the same way.
XXX put example codes XXX
4. Use the super-observations#
Use the monitor containing the super-observations for a forward simulation and a adjoint test.
If everything is normal, it can then be used as input for inversions with the same configuration of the CTM.