Concatenation of observations from parsers standard/std

Concatenation of observations from parsers standard/std#

Description#

This plugin is the entry point to pre-process observations from various datastreams, concatenate them into a single datastore (see here for details on the observation format) and feed the observation vector.

Various observations datastreams are implemented as obsparsers in pyCIF.

Note

It is possible to either specify a list of providers or a single one:

  1. list of providers: as a Yaml paragraph, with each key a sub-paragraph specifying each provider’s arguments (the name of the keys in the main Yaml paragraph, naming each provider sub-paragraph does not impact the computation of the pre-processing),

  2. single provider: the providers arguments are given at the same Yaml level as the measurement paragraph.

For every provider as specified in the Yaml, the ClassMethod parser.parse_multiple_files(spec=spec) is called. By default, this function is defined as:

class pycif.utils.classes.obsparsers.ObsParser(plg_orig=None, orig_name='', **kwargs)[source]#

Class for handling time series parsing from different data providers and data file formats.

parse_multiple_files(**kwargs)[source]#

Parses multiple files specified by a glob pattern and stores the content into a datastore.

Args:

self: the plugin with its describing arguments (in particular dir_obs)

Returns:

dict: {obs_file} = df[obssite_id, parameter]

Note:

By default, the function calls self.parse_file, which filters out NaNs and check that all required columns are available.

For more complex providers that cannot simply be processed using the default processing chain (e.g., for satellites), it is possible to define a custom parse_multiple_files function in the corresponding ObsParser Plugin

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Optional arguments#

specname : str, optional

Species name to use if different than the one in the corresponding datavect paragraph

dump_type : str, optional, default “nc”

How to dump intermediate observation datastores

file_monitor : str, optional

Path to a pre-formatted monitor file.

providers : optional

List of providers and corresponding arguments to parse. See the documentation of obsparser objects for further details: here

provider : str, optional

Used only if providers is not specified. Can be used if only one provider is to be used. In that case, the provider arguments are directly specified at the same level.

YAML template#

Please find below a template for a YAML configuration:

 1measurements:
 2  plugin:
 3    name: standard
 4    version: std
 5    type: measurements
 6
 7    # Optional arguments
 8    specname: XXXXX  # str
 9    dump_type: XXXXX  # str
10    file_monitor: XXXXX  # str
11    providers: XXXXX  # any
12    provider: XXXXX  # str