Source code for pycif.plugins.datastreams.fields.bc_plugin_template.fetch

import os

import numpy as np
import pandas as pd
import datetime
from .....utils import path


[docs]def fetch( ref_dir, ref_file, input_dates, target_dir, tracer=None, component=None ): """ Retrieves the required files according to the simulation and the data files available Args --------- - ref_dir: directory where the original files are found - ref_file: (template) name of the original files - input_dates: list of two dates: the beginning and end of the simulation - target_dir: directory where the links to the orginal files are created Returns --------- - list_dates: a dictionary in which each key leads to a list of intervals [date_beginning, date_end] so that each interval is covered by one value taken fom the matching file stored in list_files. - list_files: dictionary in which each key leads to a list of files so that the list of intervals are covered by the values provided in these files. Chosing the keys for both dictionary: the most efficient ways are to use either i) the dates at which the data files begin or ii) dates matching the typical use of this data. Example: if the data is typically used for generating BCs per day, use the dates of the days to simulate as keys. The idea is to avoid to list the same file in several keys because the read routine is called for each key. Examples for a simulation from 01-01-2001 00H00 to 01-02-2001 00H00 for which input BC files cover 24 hours at an hourly resolution: - data = annual data for 2001: - list_dates = { '01-01-2001': [[01-01-2001 00H00, 31-12-2001 23H59]] } - list_files = { '01-01-2001': [[yearly_data_file]] } - data = hourly data in daily files: - list_dates = { '01-01-2001 00H00': [ [01-01-2001 00H00, 01-01-01-2001 01H00 ], [01-01-2001 01H00, 01-01-01-2001 02H00 ], [01-01-2001 02H00, 01-01-01-2001 03H00 ], ... [01-01-2001 23H00, 01-02-01-2001 00H00 ]] } - list_files = { '01-01-2001 00H00': [ daily_data_file_for_01/01/2001, daily_data_file_for_01/01/2001, daily_data_file_for_01/01/2001, ... ] } Notes -------- - the information file_freq, provided in the yaml file and accessible through tracer.file_freq, is used here and only here. - the intervals listed in list_dates are used to perform the time interpolation. They must therefore be the smallest intervals during which the values are constant XXXmal dit?XX. Example: if time profiles are applied (see XX for option apply_profile and how to provide the profile data) to yearly data, the intervals must be the intervals obtained after applying the profiles (e.g. monthly, hourly) and not the whole year. - the decumulation of fields is taken care of in read """ print('Initialize dictionaries') list_files = {} list_dates = {} print('List dates in the simulation with a frequency matching the files\' availability') print('i.e. case i) for generating keys') list_period_dates = \ pd.date_range(input_dates[0], input_dates[1], freq=tracer.file_freq) #XXX donner adaptation si file_freq plus grande que l'ecart entre les deux input_datesXX print('Fill in dictionary for each key') for dd in list_period_dates: print('Put here the building of the list of intervals for this key') # Example: to get an hourly resolution (assuming file_freq >= 1H) # list_hours = pd.date_range( # dd, dd + pd.to_timedelta(tracer.file_freq), freq="1H") # WARNING: to_timedelta does not work with all frequencies! # list_dates[dd] = [[hh, hh + datetime.timedelta(hours=1)] # for hh in list_hours] print('Put here the retrieving of the file names for the intervals') # Example: file_freq >= 1H # a) retrieve file name for the key: # file = dd.strftime("{}/{}".format(ref_dir, ref_file)) # b) repeat it for the one-hour intervals # list_files[dd] = (len(list_hours) * [file]) print('Fetching as such = link the actual files in target_dir') # Example: if one file per key # target_file = "{}/{}".format(target_dir, os.path.basename(file)) # path.link(file, target_file) return list_files, list_dates