Concatenation of observations from parsers standard/std
#
Description#
This plugin is the entry point to pre-process observations from various datastreams, concatenate them into a single datastore (see here for details on the observation format) and feed the observation vector.
Various observations datastreams are implemented as obsparsers in pyCIF.
Note
It is possible to either specify a list of providers or a single one:
list of providers: as a Yaml paragraph, with each key a sub-paragraph specifying each provider’s arguments (the name of the keys in the main Yaml paragraph, naming each provider sub-paragraph does not impact the computation of the pre-processing),
single provider: the providers arguments are given at the same Yaml level as the
measurement
paragraph.
For every provider
as specified in the Yaml, the ClassMethod
parser.parse_multiple_files(spec=spec) is called.
By default, this function is defined as:
- class pycif.utils.classes.obsparsers.ObsParser(plg_orig=None, orig_name='', **kwargs)[source]#
Class for handling time series parsing from different data providers and data file formats.
- parse_multiple_files(**kwargs)[source]#
Parses multiple files specified by a glob pattern and stores the content into a datastore.
- Args:
self: the plugin with its describing arguments (in particular dir_obs)
- Returns:
dict: {obs_file} = df[obssite_id, parameter]
- Note:
By default, the function calls self.parse_file, which filters out NaNs and check that all required columns are available.
For more complex providers that cannot simply be processed using the default
processing chain (e.g., for satellites), it is possible to define a custom
parse_multiple_files
function in the corresponding ObsParser Plugin
YAML arguments#
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
Optional arguments#
- specname : str, optional
Species name to use if different than the one in the corresponding
datavect
paragraph- dump_type : str, optional, default “nc”
How to dump intermediate observation datastores
- file_monitor : str, optional
Path to a pre-formatted monitor file.
- providers : optional
List of providers and corresponding arguments to parse. See the documentation of
obsparser
objects for further details: here- provider : str, optional
Used only if
providers
is not specified. Can be used if only one provider is to be used. In that case, the provider arguments are directly specified at the same level.
YAML template#
Please find below a template for a YAML configuration:
1measurements:
2 plugin:
3 name: standard
4 version: std
5 type: measurements
6
7 # Optional arguments
8 specname: XXXXX # str
9 dump_type: XXXXX # str
10 file_monitor: XXXXX # str
11 providers: XXXXX # any
12 provider: XXXXX # str