Ensemble Square-Root Filter EnSRF/std#
Description#
Ensemble Square Root Filter (EnSRF) inversion.
Based on the CTDAS implementation (Peters et al., 2005). The CIF implementation from CTDAS has been described in Thanwerdas et al., 2025
Mathematical framework#
The EnSRF is a Monte Carlo approximation of the Kalman filter that represents the background error covariance \(\mathbf{B}\) implicitly through an ensemble of \(N\) prior perturbations.
Ensemble representation#
Let \(\mathbf{x}_b\) be the ensemble mean and \(\mathbf{X}_b \in \mathbb{R}^{n \times N}\) the matrix of deviations from the mean. The background error covariance is approximated as:
Similarly, for the observation operator \(\mathbf{H}\):
where each column of \(\mathbf{Y}_b\) is obtained by running the (possibly non-linear) observation operator on the corresponding ensemble member.
Analysis update (mean)#
The posterior mean is updated as:
with the ensemble-based Kalman gain
Analysis update (deviations — “square root” step)#
The ensemble deviations are updated without perturbing the observations,
which avoids the sampling noise of the standard EnKF. For each observation
\(y_j\) processed sequentially (serial_optimization = True):
where \(\mathbf{k}_j\) is the Kalman gain for observation \(j\) and \(\tilde{\alpha}_j = (1 + \sqrt{R_{jj}/(\mathbf{H}\mathbf{P}\mathbf{H}^\top + R)_{jj}})^{-1}\) is the EnSRF scalar factor.
Assimilation windows and lag#
The inversion period is split into consecutive assimilation windows of length
window_length. When nlag > 1, each segment spans nlag windows,
allowing the observations in a given window to constrain state variables from
the preceding nlag - 1 windows (a “lag” smoother).
Localization#
When localization is configured, the Kalman gain is modulated element-wise
by a distance-based correlation function (Gaussian or exponential) to limit
spurious long-range correlations arising from finite ensemble size:
where \(\rho_{ij}\) depends on the distance between the \(i\)-th state element and the \(j\)-th observation.
YAML arguments#
The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:
- nsample : int, mandatory
Number of random samples in the ensemble
- reload_results : bool, optional, default False
Skip already-completed simulations and reload their results from disk.
- batch_sampling : bool, optional, default True
Run all ensemble members within a single observation-operator call. If
False, each member is submitted as a separate job.
- batch_subjob : bool, optional, default False
Submit the batch ensemble run as a separate HPC job rather than executing it in-process.
- max_nsamples_per_run : int, optional, default 5
Maximum number of members per observation-operator call (only used when
batch_sampling = False).
- include_system_samples : bool, optional, default True
Include the three system-bound members (prior mean, prior perturbed, posterior mean) in each operator call when
batch_sampling = False. Required when the model needs scaling factors as inputs. These three members count towardmax_nsamples_per_run.
- serial_optimization : bool, optional, default True
Assimilate observations one at a time (EnSRF serial update). If
False, assimilate all observations simultaneously via matrix operations.
- window_length : str, optional
Length of each assimilation window as a pandas frequency string (e.g.
'7D','1MS'). Defaults to the full simulation period.
- nlag : int, optional, default 1
Number of windows per assimilation segment (lag smoother).
nlag = 1gives a standard filter;nlag > 1allows later observations to constrain earlier windows.
- mean_propagwgt : optional, default 0
Weight applied to the ensemble mean when a new window enters the segment (used only when
nlag >= 2). Accepts a scalar or a mapping{component: weight}. Components without an explicit weight receive 0.
- localization : optional
Apply localization to every assimilated observation.
- Argument structure:
- decay_length : float, mandatory
Correlation length to apply.
- decay_func : “exponential” or “normal”, optional, default “normal”
Correlation function to apply.
- full_localization : bool, optional, default False
Apply both space-obs localization and obs-obs localization. If False, apply only space-obs localization.
- restart_format : str, optional, default “restart_%Y%m%d%H.nc”
Format of the restart file to fetch after a posterior forward simulation.
- seed : bool, optional, default False
Use a seed to generate random samples.
- seed_id : int, optional, default 0
ID of the numpy seed to use.
- unbias_ensemble : bool, optional, default False
Force the ensemble to have a mean and a standard deviation consistent with the distribution modes.
- set_deviations_equal : bool, optional, default False
During the generation of samples, ensure each window contains the same deviations from the mean to reduce noise cancellation.
- level_metrics : int, optional, default 1
Level which defines the number of metrics to compute:” 0. No metrics computed. 1. Only metrics not involving the calculation of the posterior matrix’s eigenvalues. 2. All metrics
- flushrun : bool, optional, default False
Removes the unnecessary directories in the sampling.
- save_out_netcdf : bool, optional, default False
Save prior and final posterior vector as NetCDF. This argument overwrites the corresponding argument in the
controlvect.
Requirements#
The current plugin requires the present plugins to run properly:
Requirement name |
Requirement type |
Explicit definition |
Any valid |
Default name |
Default version |
|---|---|---|---|---|---|
obsvect |
False |
True |
standard |
std |
|
controlvect |
True |
True |
standard |
std |
|
obsoperator |
True |
True |
standard |
std |
|
platform |
True |
True |
None |
None |
YAML template#
Please find below a template for a YAML configuration:
1mode:
2 plugin:
3 name: EnSRF
4 version: std
5 type: mode
6
7 # Mandatory arguments
8 nsample: XXXXX # int
9
10 # Optional arguments
11 reload_results: XXXXX # bool
12 batch_sampling: XXXXX # bool
13 batch_subjob: XXXXX # bool
14 max_nsamples_per_run: XXXXX # int
15 include_system_samples: XXXXX # bool
16 serial_optimization: XXXXX # bool
17 window_length: XXXXX # str
18 nlag: XXXXX # int
19 mean_propagwgt: XXXXX # any
20 localization:
21 decay_func: XXXXX # exponential|normal
22 decay_length: XXXXX # float
23 full_localization: XXXXX # bool
24 restart_format: XXXXX # str
25 seed: XXXXX # bool
26 seed_id: XXXXX # int
27 unbias_ensemble: XXXXX # bool
28 set_deviations_equal: XXXXX # bool
29 level_metrics: XXXXX # int
30 flushrun: XXXXX # bool
31 save_out_netcdf: XXXXX # bool
See also