Working directory structure#

Every pyCIF run writes its outputs to a dedicated $WORKDIR directory whose path is set by the workdir key in the YAML configuration file. The directory is created automatically if it does not exist.

The tree below describes a complete forward-run working directory for the toy Gaussian model. Other configurations (inversions, multiple modes, ensemble methods) add further sub-directories but follow the same conventions.

$WORKDIR/
│
├── config.yml                  ← copy of the YAML configuration used
├── pycif.log                   ← full run log (name set by logfile:)
├── VERSION                     ← git branch and commit hash for reproducibility
│
├── datavect/                   ← data-vector inputs, resolved at startup
│   ├── {component}/
│   │   └── {parameter}/        ← raw input files as read by the datastream plugin
│   └── {component}.{parameter}.txt   ← file/date list (dump_debug: true only)
│
├── controlvect.pickle          ← prior / posterior control vector (forward / inversion)
├── controlvect/                ← control-vector in human-readable NetCDF format
│   └── {component}/
│       └── controlvect_{component}_{parameter}.nc   ← netCDF-format snapshot
│
├── model/                      ← model-specific outputs and cached data
│   └── H_matrix.pickle         ← observation operator matrix (dummy model)
│
└── obsoperator/
    │
    ├── pipe_inputs.txt         ← data requirements of every transform (YAML)
    ├── transform_description.txt  ← inputs/outputs/precursors/successors
    ├── transform_pipe_forward.txt  ← forward execution order
    ├── transform_pipe_adjoint.txt  ← adjoint execution order
    │
    ├── fwd_0000/               ← forward run #0 (index incremented for each call)
    │   ├── controlvect.pickle  ← control vector used for this run
    │   ├── controlvect/        ← netCDF snapshot of the control vector
    │   ├── finished_transforms.txt  ← list of completed transforms (restart support)
    │   ├── obsvect/            ← simulated observation vector
    │   │   └── {component}/{parameter}/monitor.nc
    │   └── {YYYY-MM-DD_HH-MM}/  ← one sub-directory per sub-simulation period
    │       ├── {model inputs and outputs}
    │       └── chain/          ← files chained to the next sub-period (e.g. end-concentrations)
    │
    └── adj_0000/               ← adjoint run #0 (inversion and adj-TL test only)
        └── ...                 ← same structure as fwd_0000/

Key files and conventions#

pycif.log#

The main run log. Its name is set by the logfile key in the YAML. Verbosity is controlled by the verbose key (0 = errors only, 1 = info, 2 = debug).

VERSION#

Records the git branch and commit hash at the time of the run. Used for reproducibility: re-running from the same VERSION and config.yml should give bit-identical results.

datavect/#

Contains the input data as read and interpolated by each datastream plugin. The exact sub-structure depends on the plugins used. When dump_debug: true is set in the datavect YAML block, a {component}.{parameter}.txt file is added for each tracer listing the resolved file paths and date ranges — see Files for checks and debugging for details.

obsoperator/fwd_NNNN/#

One directory per call to the forward observation operator, numbered from 0000. Numbering ensures that multiple calls (e.g. in an inversion loop) never overwrite each other. The run_id parameter passed to obsoper() controls the index.

obsoperator/fwd_0000/{YYYY-MM-DD_HH-MM}/#

One sub-directory per sub-simulation period (model subsimu_dates). Contains all the model input files, output files, and intermediate products for that period. The chain/ sub-directory holds fields that must be passed forward in time (e.g. end-concentration fields used as initial conditions for the next period).

finished_transforms.txt#

Written inside each fwd_NNNN/ directory. Contains a semicolon-separated list of transforms that completed successfully. Used by the autorestart mechanism: when autorestart: true is set in the obsoperator YAML block, any transform already listed here is skipped on a restart, allowing interrupted runs to resume from the point of failure.

Inversion-specific outputs#

For variational inversions (mode: 4dvar), the following additional directories appear under $WORKDIR:

$WORKDIR/
├── simulator/
│   ├── cost.txt        ← cost function value at each iteration (CSV format)
│   ├── cost.csv        ← same, machine-readable
│   ├── gradcost.txt    ← gradient norm at each iteration
│   └── gradcost.csv
└── controlvect/
    ├── controlvect_final.pickle   ← optimised posterior control vector
    └── controlvect/               ← netCDF posterior control vector

For ensemble methods (mode: EnSRF), an ensemble/ sub-directory is created under $WORKDIR containing one sample directory per ensemble member.

See also