Datastreams datastream
#
Available Datastreams datastream
#
The following sub-types and datastreams
are implemented in pyCIF so far:
- Backgrounds
datastream
- Fields
datastream
- CAMS netcdf files
CAMS/netcdf
- CHIMERE INI_CONCS and BOUN_CONCS netcdf files
CHIMERE/icbc
- ECMWF grib2 data files
ECMWF/grib2
- Global averages from NOAA data
NOAA/glob_avg
- Gridded NetCDF initial conditions
gridded_netcdf/std
- ICON-ART initial and lateral boundary conditions
ICON-ART/icbc
- LMDz 4 72x96x19 grid photorates
oldLMDZ/photochem
- LMDz 4 72x96x19 grid prescribed concentrations
oldLMDZ/prescrconcs
- LMDz 4 72x96x19 grid restart
oldLMDZ/ic
- LMDz photorates
LMDZ/photochem
- LMDz prescribed concentrations
LMDZ/prescrconcs
- LMDz production and loss fields
LMDZ/prodloss3d
- LMDz restart concentrations
LMDZ/ic
- LMDz trajq files
LMDZ-trajq/netcdf
- TM5 initial condition files
TM5/ic
- Template plugin for BCs
BCs/template
- wrfchem/icbc
wrfchem/icbc
- CAMS netcdf files
- Fluxes
datastream
- EDGAR/v5
EDGAR/v5
- Becker coastal fluxes
becker/ocean
- CHIMERE AEMISSIONS and BEMISSIONS netcdf files
CHIMERE/AEMISSIONS
- CarbonMonitor/netcdf
CarbonMonitor/netcdf
- Copernicus Marine Service CO2 fluxes
CMEMS/std
- Dummy model - NetCDF
dummy/nc
- Dummy model - text
dummy/txt
- FLEXPART/nc
FLEXPART/nc
- Fluxes from csv-formatted point sources
point_sources/std
- GCP at 1 degree x 1 degree
GCP_N2O/1x1
- Global Carbon Fluxes
GCP/1x1
- Global Fire Emission Database v4
GFED4/std
- GridFED emission database
GridFED/std
- Gridded NetCDF surface flux
gridded_netcdf/std
- ICON-ART surface fluxes
ICON-ART/sflx
- INS/2012web
INS/2012web
- LMDZ/bin
LMDZ/bin
- LMDZ surface fluxes
LMDZ/sflx
- ORCHIDEE surface fluxes
orchidee/std
- TM5/std
TM5/std
- TNO/netcdf
TNO/netcdf
- Template plugin for fluxes
flux/template
- VPRM/netcdf
VPRM/netcdf
- VPRM1km/netcdf
VPRM1km/netcdf
- wrfchem/std
wrfchem/std
- EDGAR/v5
- Meteos
datastream
Documentation#
Description#
The datastream
Plugin type include interfaces to input data for pycif,
with the exception of observations.
It includes the sub-types flux
, meteo
and field
.
It is used for the following purposes:
fetching relevant input files for direct use by, e.g, CTMs, only linking to the original file
reading relevant input files when data manipulation is required, for, e.g., defining the control vector, or auxiliary transformations, such as temporal interpolation or horizontal regridding
writing data from pycif to the corresponding format; this can either be used when data from pycif needs to be read as input for a CTM, or for sharing data from pycif with a known standard data format
Required parameters, dependencies and functions#
Functions#
A given datastream
Plugin requires the following functions to work
properly within pycif:
fetch
get_domain (optional)
read
write (optional)
Please find below details on these functions.
fetch#
The fetch
function determines what files and corresponding dates are available
for running the present case.
The structure of the fetch
function is shown below:
- pycif.plugins.datastreams.fluxes.flux_plugin_template.fetch(ref_dir, ref_file, input_dates, target_dir, tracer=None, component=None, **kwargs)[source]
Fetch files and dates for the given simulation interval. Determine what dates are available in the input data within the simulation interval. Link reference files to the working directory to avoid interactions with the outer world.
Should include input data dates encompassing the simulation interval, which means that, e.g, if input data are at the monthly scale and the simulation interval starts on 2010-01-15 to 2010-03-15, the output should at least include the input data dates for 2010-01, 2010-02 and 2010-03.
- Note:
The three main arguments (
ref_dir
,ref_file
andfile freq
) can either be defined asdir
,file
andfile_freq
respectively in the relevant davavect/flux/my_spec paragrah in the yaml, or, if not available, they are fetched from the corresponding components/flux paragraph. If one of the three needs to have a default value, it can be integrated in the input_arguments dictionary in__init__.py
- Args:
ref_dir (str): the path to the input files ref_file (str): format of the input files input_dates (list): simulation interval (start and end dates) target_dir (str): where to copy tracer: the tracer Plugin, corresponding to the paragraph
datavect/components/fluxes/parameters/my_species
in the configuration yaml; can be needed to fetch extra information given by the user- component: the component Plugin, same as tracer; corresponds to the paragraph
datavect/components/fluxes
in the configuration yaml
- Return:
(dict, dict): returns two dictionaries: list_files and list_dates
- list_files: for each date that begins a period, a list containing
the names of the files that are available for the dates within this period
- list_dates: for each date that begins a period, a list containing
the date intervals (in the form of a list of two dates each) matching the files listed in list_files
- Note:
The output format can be illustrated as follows (the dates are shown as strings, but datetime.datetime objects are expected):
list_dates = { "2019-01-01 00:00": [["2019-01-01 00:00", "2019-01-01 03:00"], ["2019-01-01 03:00", "2019-01-01 06:00"], ["2019-01-01 06:00", "2019-01-01 09:00"], ["2019-01-01 09:00", "2019-01-01 12:00"]], "2019-01-01 12:00": [["2019-01-01 12:00", "2019-01-01 15:00"], ["2019-01-01 15:00", "2019-01-01 18:00"], ["2019-01-01 18:00", "2019-01-01 21:00"], ["2019-01-01 21:00", "2019-01-02 00:00"]] } list_files = { "2019-01-01 00:00": ["path_to_file_for_20190101_0000", "path_to_file_for_20190101_0300", "path_to_file_for_20190101_0600", "path_to_file_for_20190101_0900"], "2019-01-01 12:00": ["path_to_file_for_20190101_1200", "path_to_file_for_20190101_1500", "path_to_file_for_20190101_1800", "path_to_file_for_20190101_2100"] }
In the example above, the native temporal resolution is 3-hourly, and files are available every 12 hours
- Note:
There is no specific rule for sorting dates and files into separate keys of the output dictionaries. The usage rule would be to have one dictionary key per input file, therein unfolding all available dates in the corresponding file; in that rule, the content of
list_files
is a duplicate of the same file over again in every given key of the dictionary.But any combination of the keys is valid as long as the list of dates of each key corresponds exactly to the file with the same index. Hence, it is acceptable to have, e.g., one key with all dates and files, or one key per date even though there are several date per file.
The balance between the number of keys and the size of each key should be determined by the standard usage expected with the data. overall, a good practice is to have one key in the input data for each sub-simulation for which it will be used afterwards by the model.
For instance, CHIMERE emission files store hourly emissions for CHIMERE sub-simulations, typically 24-hour long. It thus makes sense to have one key per 24-hour period and in each key the hour emissions.
get_domain (optional)#
- pycif.plugins.datastreams.fluxes.flux_plugin_template.get_domain(ref_dir, ref_file, input_dates, target_dir, tracer=None)[source]
Read information to define the data horizontal and, if relevant, vertical domain.
There are several possible approaches:
read a reference file that is necessary in
ref_dir
read a file among the available data files
read a file specified in the yaml, by using the corresponding variable name; for instance, tracer.my_file
From the chosen file, obtain the coordinates of the centers and/or the corners of the grid cells. If corners or centers are not available, deduce them from the available information.
- Warning:
the grid must not be overlapping: e.g for a global grid, the last grid cell must not be the same as the first
- Warning:
Longitudes must be in the range [-180, 180]. For dataset with longitudes beyond -180 or 180, please shift them. and adapt the
read
function accordingly- Warning:
Order the centers and corners latitudes and longitudes in increasing order
- Note:
If the domain information need to be read from one of the files returned by the fetch function, one should use the variable
tracer.input_files
as follow:ref_file = list(itertools.chain.from_iterable(tracer.input_files.values()))[0]
- Args:
ref_dir (str): the path to the input files ref_file (str): format of the input files input_dates (list): simulation interval (start and end dates) target_dir (str): where to copy tracer: the tracer Plugin, corresponding to the paragraph
datavect/components/fluxes/parameters/my_species
in the configuration yaml; can be needed to fetch extra information given by the user- Return:
- Domain: a domain class object, with the definition of the center grid
cells coordinates, as well as corners
read#
- pycif.plugins.datastreams.fluxes.flux_plugin_template.read(self, name, varnames, dates, files, interpol_flx=False, tracer=None, model=None, ddi=None, **kwargs)[source]
Get fluxes from raw files and load them into a pyCIF variables.
The list of date intervals and corresponding files is directly provided, coming from what is returned by the
fetch
function. One should loop on dates and files and extract the corresponding temporal slice of data- Warning:
Make sure to optimize the opening of files. There is high chances that the same file has to be open and closed over and over again to loop on the dates. If this is the case, make sure not to close it between each date.
- Args:
name (str): name of the component varnames (list[str]): original names of variables to read; use name
if varnames is empty
dates (list): list of the date intervals to extract files (list): list of the files matching dates
- Return:
- xr.DataArray: the actual data with dimension:
time, levels, latitudes, longitudes