Concatenation of observations from parsers (standard / std)

Description

This plugin is the entry point to pre-process observations from various datastreams, concatenate them into a single datastore (see here for details on the observation format) and feed the observation vector.

Various observations datastreams are implemented as obsparsers in pyCIF.

Note

It is possible to either specify a list of providers or a single one:

  1. list of providers: as a Yaml paragraph, with each key a sub-paragraph specifying each provider’s arguments (the name of the keys in the main Yaml paragraph, naming each provider sub-paragraph does not impact the computation of the pre-processing),

  2. single provider: the providers arguments are given at the same Yaml level as the measurement paragraph.

For every provider as specified in the Yaml, the ClassMethod parser.parse_multiple_files(spec=spec) is called. By default, this function is defined as:

class pycif.utils.classes.obsparsers.ObsParser(plg_orig=None, orig_name='', **kwargs)[source]

Class for handling time series parsing from different data providers and data file formats.

parse_multiple_files(**kwargs)[source]

Parses multiple files specified by a glob pattern and stores the content into a datastore.

Parameters:

self – the plugin with its describing arguments (in particular dir_obs)

Returns:

{obs_file} = df[obssite_id, parameter]

Return type:

dict

Note

By default, the function calls self.parse_file, which filters out NaNs and check that all required columns are available.

For more complex providers that cannot simply be processed using the default processing chain (e.g., for satellites), it is possible to define a custom parse_multiple_files function in the corresponding ObsParser Plugin

Yaml arguments

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

Optional arguments

specname: (optional)

Species name to use if different than the one in the corresponding datavect paragraph

accepted type: str

dump_type: (optional): nc

How to dump intermediate observation datastores

accepted type: str

file_monitor: (optional)

Path to a pre-formatted monitor file.

accepted type: str

providers: (optional)

List of providers and corresponding arguments to parse. See the documentation of obsparser objects for further details: here

provider: (optional)

Used only if providers is not specified. Can be used if only one provider is to be used. In that case, the provider arguments are directly specified at the same level.

accepted type: str

Yaml template

Please find below a template for a Yaml configuration:

 1measurements:
 2  plugin:
 3    name: standard
 4    version: std
 5    type: measurements
 6
 7  # Optional arguments
 8  specname: XXXXX  # str
 9  dump_type: XXXXX  # str
10  file_monitor: XXXXX  # str
11  providers: XXXXX  # any
12  provider: XXXXX  # str