Ensemble Square-Root Filter EnSRF/std#

Description#

Ensemble Square Root Filter (EnSRF) inversion.

Based on the CTDAS implementation (Peters et al., 2005). The CIF implementation from CTDAS has been described in Thanwerdas et al., 2025

Mathematical framework#

The EnSRF is a Monte Carlo approximation of the Kalman filter that represents the background error covariance \(\mathbf{B}\) implicitly through an ensemble of \(N\) prior perturbations.

Ensemble representation#

Let \(\mathbf{x}_b\) be the ensemble mean and \(\mathbf{X}_b \in \mathbb{R}^{n \times N}\) the matrix of deviations from the mean. The background error covariance is approximated as:

\[\mathbf{B} \approx \frac{1}{N-1}\,\mathbf{X}_b\mathbf{X}_b^\top\]

Similarly, for the observation operator \(\mathbf{H}\):

\[\mathbf{H}\mathbf{X}_b \approx \mathbf{Y}_b \in \mathbb{R}^{m \times N}\]

where each column of \(\mathbf{Y}_b\) is obtained by running the (possibly non-linear) observation operator on the corresponding ensemble member.

Analysis update (mean)#

The posterior mean is updated as:

\[\mathbf{x}_a = \mathbf{x}_b + \mathbf{K}\bigl(\mathbf{y} - \overline{\mathbf{Y}_b}\bigr)\]

with the ensemble-based Kalman gain

\[\mathbf{K} = \frac{1}{N-1}\,\mathbf{X}_b \mathbf{Y}_b^\top \Bigl(\mathbf{R} + \frac{1}{N-1}\,\mathbf{Y}_b \mathbf{Y}_b^\top\Bigr)^{-1}\]

Analysis update (deviations — “square root” step)#

The ensemble deviations are updated without perturbing the observations, which avoids the sampling noise of the standard EnKF. For each observation \(y_j\) processed sequentially (serial_optimization = True):

\[\mathbf{X}_a = \mathbf{X}_b - \tilde{\alpha}_j\,\mathbf{k}_j\,(\mathbf{H}\mathbf{X}_b)^\top_{j,:}\]

where \(\mathbf{k}_j\) is the Kalman gain for observation \(j\) and \(\tilde{\alpha}_j = (1 + \sqrt{R_{jj}/(\mathbf{H}\mathbf{P}\mathbf{H}^\top + R)_{jj}})^{-1}\) is the EnSRF scalar factor.

Assimilation windows and lag#

The inversion period is split into consecutive assimilation windows of length window_length. When nlag > 1, each segment spans nlag windows, allowing the observations in a given window to constrain state variables from the preceding nlag - 1 windows (a “lag” smoother).

Localization#

When localization is configured, the Kalman gain is modulated element-wise by a distance-based correlation function (Gaussian or exponential) to limit spurious long-range correlations arising from finite ensemble size:

\[\mathbf{K}_{ij} \leftarrow \rho_{ij}\,\mathbf{K}_{ij}\]

where \(\rho_{ij}\) depends on the distance between the \(i\)-th state element and the \(j\)-th observation.

YAML arguments#

The following arguments are used to configure the plugin. pyCIF will return an exception at the initialization if mandatory arguments are not specified, or if any argument does not fit accepted values or type:

nsample : int, mandatory

Number of random samples in the ensemble

reload_results : bool, optional, default False

Skip already-completed simulations and reload their results from disk.

batch_sampling : bool, optional, default True

Run all ensemble members within a single observation-operator call. If False, each member is submitted as a separate job.

batch_subjob : bool, optional, default False

Submit the batch ensemble run as a separate HPC job rather than executing it in-process.

max_nsamples_per_run : int, optional, default 5

Maximum number of members per observation-operator call (only used when batch_sampling = False).

include_system_samples : bool, optional, default True

Include the three system-bound members (prior mean, prior perturbed, posterior mean) in each operator call when batch_sampling = False. Required when the model needs scaling factors as inputs. These three members count toward max_nsamples_per_run.

serial_optimization : bool, optional, default True

Assimilate observations one at a time (EnSRF serial update). If False, assimilate all observations simultaneously via matrix operations.

window_length : str, optional

Length of each assimilation window as a pandas frequency string (e.g. '7D', '1MS'). Defaults to the full simulation period.

nlag : int, optional, default 1

Number of windows per assimilation segment (lag smoother). nlag = 1 gives a standard filter; nlag > 1 allows later observations to constrain earlier windows.

mean_propagwgt : optional, default 0

Weight applied to the ensemble mean when a new window enters the segment (used only when nlag >= 2). Accepts a scalar or a mapping {component: weight}. Components without an explicit weight receive 0.

localization : optional

Apply localization to every assimilated observation.

Argument structure:
decay_length : float, mandatory

Correlation length to apply.

decay_func : “exponential” or “normal”, optional, default “normal”

Correlation function to apply.

full_localization : bool, optional, default False

Apply both space-obs localization and obs-obs localization. If False, apply only space-obs localization.

restart_format : str, optional, default “restart_%Y%m%d%H.nc”

Format of the restart file to fetch after a posterior forward simulation.

seed : bool, optional, default False

Use a seed to generate random samples.

seed_id : int, optional, default 0

ID of the numpy seed to use.

unbias_ensemble : bool, optional, default False

Force the ensemble to have a mean and a standard deviation consistent with the distribution modes.

set_deviations_equal : bool, optional, default False

During the generation of samples, ensure each window contains the same deviations from the mean to reduce noise cancellation.

level_metrics : int, optional, default 1

Level which defines the number of metrics to compute:” 0. No metrics computed. 1. Only metrics not involving the calculation of the posterior matrix’s eigenvalues. 2. All metrics

flushrun : bool, optional, default False

Removes the unnecessary directories in the sampling.

save_out_netcdf : bool, optional, default False

Save prior and final posterior vector as NetCDF. This argument overwrites the corresponding argument in the controlvect.

Requirements#

The current plugin requires the present plugins to run properly:

Requirement name

Requirement type

Explicit definition

Any valid

Default name

Default version

obsvect

ObsVect

False

True

standard

std

controlvect

ControlVect

True

True

standard

std

obsoperator

ObsOperator

True

True

standard

std

platform

Platform

True

True

None

None

YAML template#

Please find below a template for a YAML configuration:

 1mode:
 2  plugin:
 3    name: EnSRF
 4    version: std
 5    type: mode
 6
 7  # Mandatory arguments
 8  nsample: XXXXX  # int
 9
10  # Optional arguments
11  reload_results: XXXXX  # bool
12  batch_sampling: XXXXX  # bool
13  batch_subjob: XXXXX  # bool
14  max_nsamples_per_run: XXXXX  # int
15  include_system_samples: XXXXX  # bool
16  serial_optimization: XXXXX  # bool
17  window_length: XXXXX  # str
18  nlag: XXXXX  # int
19  mean_propagwgt: XXXXX  # any
20  localization:
21    decay_func: XXXXX  # exponential|normal
22    decay_length: XXXXX  # float
23    full_localization: XXXXX  # bool
24  restart_format: XXXXX  # str
25  seed: XXXXX  # bool
26  seed_id: XXXXX  # int
27  unbias_ensemble: XXXXX  # bool
28  set_deviations_equal: XXXXX  # bool
29  level_metrics: XXXXX  # int
30  flushrun: XXXXX  # bool
31  save_out_netcdf: XXXXX  # bool