Working directory structure#
Every pyCIF run writes its outputs to a dedicated $WORKDIR directory
whose path is set by the workdir key in the YAML configuration file.
The directory is created automatically if it does not exist.
The tree below describes a complete forward-run working directory for the toy Gaussian model. Other configurations (inversions, multiple modes, ensemble methods) add further sub-directories but follow the same conventions.
$WORKDIR/
│
├── config.yml ← copy of the YAML configuration used
├── pycif.log ← full run log (name set by logfile:)
├── VERSION ← git branch and commit hash for reproducibility
│
├── datavect/ ← data-vector inputs, resolved at startup
│ ├── {component}/
│ │ └── {parameter}/ ← raw input files as read by the datastream plugin
│ └── {component}.{parameter}.txt ← file/date list (dump_debug: true only)
│
├── controlvect.pickle ← prior / posterior control vector (forward / inversion)
├── controlvect/ ← control-vector in human-readable NetCDF format
│ └── {component}/
│ └── controlvect_{component}_{parameter}.nc ← netCDF-format snapshot
│
├── model/ ← model-specific outputs and cached data
│ └── H_matrix.pickle ← observation operator matrix (dummy model)
│
└── obsoperator/
│
├── pipe_inputs.txt ← data requirements of every transform (YAML)
├── transform_description.txt ← inputs/outputs/precursors/successors
├── transform_pipe_forward.txt ← forward execution order
├── transform_pipe_adjoint.txt ← adjoint execution order
│
├── fwd_0000/ ← forward run #0 (index incremented for each call)
│ ├── controlvect.pickle ← control vector used for this run
│ ├── controlvect/ ← netCDF snapshot of the control vector
│ ├── finished_transforms.txt ← list of completed transforms (restart support)
│ ├── obsvect/ ← simulated observation vector
│ │ └── {component}/{parameter}/monitor.nc
│ └── {YYYY-MM-DD_HH-MM}/ ← one sub-directory per sub-simulation period
│ ├── {model inputs and outputs}
│ └── chain/ ← files chained to the next sub-period (e.g. end-concentrations)
│
└── adj_0000/ ← adjoint run #0 (inversion and adj-TL test only)
└── ... ← same structure as fwd_0000/
Key files and conventions#
pycif.log#
The main run log. Its name is set by the logfile key in the YAML.
Verbosity is controlled by the verbose key (0 = errors only,
1 = info, 2 = debug).
VERSION#
Records the git branch and commit hash at the time of the run.
Used for reproducibility: re-running from the same VERSION and
config.yml should give bit-identical results.
datavect/#
Contains the input data as read and interpolated by each datastream plugin.
The exact sub-structure depends on the plugins used. When dump_debug:
true is set in the datavect YAML block, a {component}.{parameter}.txt
file is added for each tracer listing the resolved file paths and date ranges —
see Files for checks and debugging for details.
obsoperator/fwd_NNNN/#
One directory per call to the forward observation operator, numbered from
0000. Numbering ensures that multiple calls (e.g. in an inversion loop)
never overwrite each other. The run_id parameter passed to
obsoper() controls the index.
obsoperator/fwd_0000/{YYYY-MM-DD_HH-MM}/#
One sub-directory per sub-simulation period (model subsimu_dates).
Contains all the model input files, output files, and intermediate products
for that period. The chain/ sub-directory holds fields that must be
passed forward in time (e.g. end-concentration fields used as initial
conditions for the next period).
finished_transforms.txt#
Written inside each fwd_NNNN/ directory. Contains a semicolon-separated
list of transforms that completed successfully. Used by the autorestart
mechanism: when autorestart: true is set in the obsoperator YAML
block, any transform already listed here is skipped on a restart, allowing
interrupted runs to resume from the point of failure.
Inversion-specific outputs#
For variational inversions (mode: 4dvar), the following additional
directories appear under $WORKDIR:
$WORKDIR/
├── simulator/
│ ├── cost.txt ← cost function value at each iteration (CSV format)
│ ├── cost.csv ← same, machine-readable
│ ├── gradcost.txt ← gradient norm at each iteration
│ └── gradcost.csv
└── controlvect/
├── controlvect_final.pickle ← optimised posterior control vector
└── controlvect/ ← netCDF posterior control vector
For ensemble methods (mode: EnSRF), an ensemble/ sub-directory
is created under $WORKDIR containing one sample directory per ensemble
member.
See also
Files for checks and debugging — diagnostic files generated during pipeline initialisation and execution.
Control vectors — format of the control-vector netCDF files.
Observation vectors — format of the observation-vector
monitor.ncfiles.Observations — structure and columns of monitor files.