The author has declared that no competing interests exist.

Simulator imperfection, often known as model error, is ubiquitous in practical data assimilation problems. Despite the enormous efforts dedicated to addressing this problem, properly handling simulator imperfection in data assimilation remains to be a challenging task. In this work, we propose an approach to dealing with simulator imperfection from a point of view of functional approximation that can be implemented through a certain machine learning method, such as kernel-based learning adopted in the current work. To this end, we start from considering a class of supervised learning problems, and then identify similarities between supervised learning and variational data assimilation. These similarities found the basis for us to develop an ensemble-based learning framework to tackle supervised learning problems, while achieving various advantages of ensemble-based methods over the variational ones. After establishing the ensemble-based learning framework, we proceed to investigate the integration of ensemble-based learning into an ensemble-based data assimilation framework to handle simulator imperfection. In the course of our investigations, we also develop a strategy to tackle the issue of multi-modality in supervised-learning problems, and transfer this strategy to data assimilation problems to help improve assimilation performance. For demonstration, we apply the ensemble-based learning framework and the integrated, ensemble-based data assimilation framework to a supervised learning problem and a data assimilation problem with an imperfect forward simulator, respectively. The experiment results indicate that both frameworks achieve good performance in relevant case studies, and that functional approximation through machine learning may serve as a viable way to account for simulator imperfection in data assimilation problems.

In recent years, the advent of big data era has led to surging interest in handling big data assimilation problems in data assimilation community [

In the past few years, there have been a series of investigations [

So far, our investigations have been mainly dedicated to tackling the issues of

Imperfection in forward simulators (also known as model error) is a ubiquitous problem in geophysical data assimilation practices. Imperfection will arise when there are, e.g., unresolved fine-scale resolutions, missing or mis-specified physical processes, incorrect boundary conditions and so on, in the course of developing a physics-based forward simulator. In the context of practical SHM, for instance, one may expect that the rock physics model (RPM), as an essential part of the forward seismic simulator, is prone to imperfection, since the RPM is often built upon simplified assumptions on rock physics, and calibrated using core or well log data at a few locations.

In the course of identifying and handling simulator imperfection during data assimilation, a challenge involving the combined effects of imperfection and uncertain model state and/or parameters will arise. For instance, when there are substantial gaps (residuals) between real and simulated observations, they may be attributed to simulator imperfection, or the inability of the assimilation algorithm to obtain globally optimal estimations of model state and/or parameters, or both. As a result, a prerequisite for addressing the issue of simulator imperfection would be to choose a method that helps untangle the gross effects of simulator imperfection and uncertain model state and/or parameters.

Currently, a common practice in this regard is to add some (typically) additive stochastic term into the forward simulator, as a simple way to represent simulator imperfection (see, for example, [

In the current work, we consider an approach that treats the modelling of simulator imperfection as a functional approximation problem, which can be solved using a certain machine learning method. To this end, we start from a supervised learning problem, in which one aims to optimize a certain function that maps a set of training inputs to a corresponding set of training outputs. We first show similarities between supervised learning and variational data assimilation. Motivated by this observation, we then proceed to develop a derivative-free, ensemble-based learning framework to tackle a class of supervised learning problems. In doing so, we are able to not only achieve all the benefits in using ensemble-based methods (which will be discussed later), but also facilitate the integration of the proposed imperfection-handling method into an ensemble-based data assimilation framework, which is presented after introducing the ensemble-based learning framework.

For demonstration, we investigate the performance of the ensemble-based learning framework in a supervised learning problem. We identify a challenge which may arise when multi-modal training inputs are present in the learning process, and propose a strategy that helps overcome this problem. After that, we study a data assimilation problem with an imperfect forward simulator. Ensemble-based learning is then incorporated into an ensemble-based assimilation algorithm to tackle the data assimilation problem, while the insights and experience gained in the supervised learning problem are transferred to the data assimilation problem, helping improve the performance of data assimilation. Based on the results obtained in these two experiments, we conclude the current work with discussions and some thoughts of future work.

This section focuses on supervised learning, which is one type of machine learning problems. Formally, machine learning is defined as “a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty” [

As manifested in our previous discussions, the current work is related to supervised learning problems (SLP). Of particular interest here is the development of an ensemble-based learning method to tackle SLP. In the literature, ensemble learning has found wide applications in various scenarios, including (but not limited to), for instance, ensemble kernel learning [

As will be shown later, the ensemble method to be presented in this work can be considered as a special case of ensemble learning, in which the base-learners share the same learning model, but differ from each other in terms of the parameters associated with each individual model. In general, the ensemble method adopted in the current work can also be extended to include various types of learning models (e.g., support vector machines, deep neural networks, etc, see [

We consider a class of supervised learning problems, in which we are given a set of _{s} inputs, denoted by _{s} outputs, denoted by _{i}) match _{i} (_{s}) to a good extent. Note that, in general, the outputs _{i} may be contaminated by certain noise.

To achieve the above objective, one can solve the SLP as a regularized empirical risk minimization (ERM) problem [_{i} and _{i}),

Clearly, without imposing any constraint on the functional

In addition, let us define
^{s} to emphasize that now ^{s}). Similar custom will be adopted later for notational convenience.

To facilitate the introduction to our idea, we first consider the situation in which the training inputs _{i} follow a certain unimodal distribution. In this case, we choose the functionals ^{b} stands for a (pre-chosen) initial guess (called “background” hereafter) of

Comparing Eqs (_{s} is dropped in

When the training inputs follow a multi-modal distribution, it may be necessary to cluster the training inputs into different groups (so that each group contains unimodal training inputs), and then estimate a set of parameters

In analogy to the advance of assimilation approaches from the conventional variational methods [

no need to develop a complicated and time-consuming adjoint system (“adjoint free”);

the capacity to provide a means of uncertainty quantification for the estimated results (“uncertainty quantification”);

the ability to handle large numbers of state and/or parameter variables (“algorithm scalability”);

straightforward and fast implementation (“implementation convenience”).

Employing this “ensemblizing” strategy, we reformulate the regularized ERM problem in _{e} is the size of the ensemble _{j} has its own associated background

Following the convention in ensemble-based methods, we choose _{y} to be the covariance matrix of the observation noise in the outputs _{i}, and _{θ} to be the sample covariance matrix with respect to the

As shown in [

Eqs (_{e}). For more information, see [

In addition, the update formula

After establishing an ensemble-based framework to handle SLP, we go back to discuss the concrete approach to functional approximation in

Specifically, following the RBF kernel approach to functional approximation in [

_{cp} kernels _{k}, center points (CP) _{k} that influence the spreads of the kernels. In the current work, for simplicity, we pre-choose _{cp} and _{k} and _{k}, as indicated in

Eqs (

_{k} that may have different values _{k,ℓ} along different axes _{ℓ} (_{cp}, as indicated in _{cp} of center points, while the dimension

In comparison to the previous work of [_{k} not only adapt to different center points

In addition, in many existing publications, the scale parameters are often manually chosen. In contrast, the ensemble-based approach (Eqs (

After establishing ensemble-based kernel learning to deal with SLP, we investigate how this framework can be integrated into ensemble-based data assimilation to handle a class of data assimilation problems with imperfect forward simulators, which bear certain similarities to 4D SHM problems. We will first formulate a mathematical description of the assimilation problems, and then develop a solution that combines ensemble-based approaches to both supervised learning and data assimilation. Within this integrated, ensemble-based framework, the solution to the target data assimilation problems involves a certain joint estimation procedure, in which one aims to simultaneously estimate both model variables and parameters associated with a set of kernel functions. From this perspective, technically speaking, this type of data assimilation problems with imperfect forward simulators would not become substantially more complicated than the corresponding assimilation problems with perfect forward simulators. Indeed, as will be shown later, for data assimilation in the presence of an imperfect forward simulator, one can still use existing ensemble-based assimilation algorithms, although there is a need to modify the forward simulator by including a residual functional to account for possible imperfection.

We consider a data assimilation problem, in which the noisy observational data (observations) _{d}. For better comprehension, here we have deliberately avoided notational overlapping with those in the proceeding section as far as possible, since such distinctions would be useful for our discussions later.

In the current work, we assume that, for all _{z}, i.e.,

As a data assimilation problem, our objective is to estimate a set ^{o} and some initial guess (background) of ^{b}, in such a way that ^{tr} as possible. In a typical setting, we have access to a certain forward simulator ^{sim}, i.e.,

Based on Eqs (^{o} and the simulations _{z}.

Following the idea of kernel approach to functional approximation (Eqs (^{cp}. Likewise, we define

It is worth noting an essential difference between SLP and data assimilation problems. In SLP (cf. Eqs (^{s} and ^{s}, respectively, as the training data; In data assimilation problems, however, typically we only have access to a single realization of the outputs (observations) ^{o} at a given time instance and a given spatial location, whereas our purpose is to infer possible inputs ^{o}. Often, due to the limited capacity of the assimilation algorithm, ^{o} and ^{tr} that generate the observations ^{o}. Because of this inconsistency and the sample frequency of observations (at a given time instance and a given spatial location), data assimilation problems with imperfect forward simulators tend to be more challenging than SLP, as we will see later.

The aforementioned difference between SLP and data assimilation motivates us to take a slightly different form in _{ℓ} and the corresponding residuals ^{o,cp} associated with ^{cp}. In general, the choice of ^{cp} and ^{o,cp} may be case-dependent. For instance, if one has a set of ^{cp} and ^{o,cp}.

With kernel-based functional approximation to the residuals, similar to [

As in Eqs (_{y} therein by _{d}, respectively.

When there is no imperfection in the forward simulator (or when one believes so), one may choose not to introduce any correction mechanism. In this case, the parameter part

As will be shown later, even with a perfect forward simulator, it might be still beneficial to include a mechanism of model-error correction (i.e., the

In this section, we investigate the performance of ensemble-based kernel learning in a toy supervised learning problem. One of our focuses here is to demonstrate a challenge arising in the toy problem, and develop a strategy that helps overcome this challenge. The insights obtained in the study will shed light on certain limitations or cautions in using the plain ensemble-based kernel learning framework, and the way for performance improvements. In turn, they will help enhance the data assimilation performance when integrating ensemble-based kernel learning into ensemble-based data assimilation.

The supervised learning problem is designed to mimic the situation of data assimilation with an imperfect forward simulator. Specifically, we consider a forward system
^{−6}, 0.1 × |

In addition, we assume that there exists another imperfect forward simulation system

For better visualization, we re-plot the reference and biased outputs over the input interval [−2, 2] in a separate, zoomed-in subplot.

In the SLP, our objective is to learn the residual function

For the purpose of comparison, in this work we adopt data mismatch and root mean squared error (RMSE) as performance measures. Following the notations in the previous sections, given the real observations ^{o}, its associated observation error covariance matrix _{d} and an ensemble member _{j} (or _{j} (or _{j} is defined as
_{j} of the model _{j} (in data assimilation problems) with respect the reference model ^{tr} is

Throughout this work, we use the iES in [

In the experiment, we generate a set of 10, 000 input samples drawn from the univariate Gaussian distribution ^{o} and the simulations ^{sim}). We randomly divide the set of input-residual pairs into two subsets: one with 8, 000 (80%) of such pairs as the training dataset, whereas the rest 2, 000 (20%) of such pairs as the cross-validation (CV) dataset. The training dataset is used to estimate the parameters associated with the selected kernel functions, whereas the CV dataset is not involved in learning these parameters. In a typical setting, the CV dataset can be adopted to select hyperparameter(s) in a learning algorithm. In this particular case, though, we do not have hyperparameter(s) to tune. Therefore, we simply use the CV dataset to inspect the performance of the learned parameters after the learning process is finished.

To employ the kernel approach to approximating the residual function _{cp} center points _{cp}). In principle, it is possible to consider both _{cp} and _{cp} and

Bearing this in mind, in the experiment below, we let _{cp} = 200, and _{cp} = 2000, which turned out to lead to results similar to what we will present below. Consequently, for brevity, below we focus on the cases with _{cp} = 200, with which the number of parameters (including the weights _{k} and the scale parameters _{k}, cf.

An additional remark is that, in comparison to the histograms in

Now we discuss how to initialize the ensembles of kernel parameters, weights _{k} and scale parameters _{k}. For convenience of discussion, let us denote the ensembles with respect to the initial weights and the initial scale parameters by _{e} = 100 unless otherwise stated, and ^{ti} is the STD of the training inputs, and _{k,j} are random samples drawn from the normal distribution

For a given ensemble member (i.e., a fixed _{cp}) as follows. We first randomly select a pair of input-label from the training dataset, denoted by _{k} therein by _{cp}) that approximately solve the the following equation:
_{cp} × _{cp} identity matrix. The term _{j} ∼ _{e} different pairs of _{e}), we get an initial ensemble of _{e} different parameter vectors

For further demonstration, in Panels (a) and (c) of

Panels (a) and (c) show the initial and final ensembles of predictions (with respect to the case of unimodal training inputs), obtained by adding to the biased curve the corresponding ensembles of residual terms, which are computed using Eqs (

To generate more training data, here we consider a scenario with multi-modal training inputs. We will first identify a challenge for the ensemble-based learning algorithm to handle multi-modal training inputs, and then investigate a strategy that helps overcome this problem.

In the experiment, we generate a set of 10, 000 input samples from the distribution

In the sequel, we first illustrate what will happen if one directly applies the ensemble-based learning algorithm to the training data with multi-modal inputs. In the experiment, we still adopt 200 center points that are evenly distributed over the interval [−6, 6). As in the previous sub-section, the ensemble-based learning algorithm is directly adopted to update 400 kernel parameters, namely, the scale and weight parameters associated with each center point. However, it turns out that, in the presence of multi-modal inputs, a straightforward application of the ensemble-based learning algorithm may not achieve satisfactory performance. This point is demonstrated in

For better visualization, in Panel (d) we re-plot the reference, the biased and the mean corrected curves over the input interval [−2, 2] in a separate, zoomed-in subplot.

The under-performance of plain, ensemble-based algorithms in handling multi-modal variables is also discussed in the literature, see, for example, [

More specifically, suppose that the multi-modal inputs are clustered into _{cl} mutually exclusive subsets, and in each subset, the pdf of the inputs is modelled by a certain Gaussian pdf. In other words, the pdf _{s} is the weight associated with the _{s} and STD _{s}. For each cluster, say, the _{s} of _{e}), where

In terms of parametrization strategy adopted in SLP, a noticeable feature in case of multi-modal training inputs is that, each cluster of training data will have its own ensemble of kernel parameters associated with the same set of center points. Following the discussion in the text after _{cp} × _{cl}, larger than that in case of unimodal training inputs. This may thus be considered as an additional way to improve the capacity of a learning model.

In the first experiment, we investigate the case where the number of clusters is the same as the number of modes in the training inputs, i.e., _{cl} = 3. We use the MATLAB function “fitgmdist” to estimate the parameters like weight (^{2}) (cf.

Cluster 1 (C1) | Cluster 2 (C2) | Cluster 3 (C3) | |
---|---|---|---|

Number of data | 8003 | 8003 | 7994 |

Estimated weight | 0.3331 | 0.3344 | 0.3325 |

Estimated mean | 5.0015 | −0.0074 | −5.0049 |

Estimated variance | 0.9754 | 1.0155 | 0.9680 |

With the aforementioned settings, in principle one can update the kernel parameters associated with each cluster in parallel, although in the current work, such updates are conducted in a sequential manner, namely, C1 → C2 → C3.

For the CV dataset, we do not pre-cluster the data points into different clusters. To compute data mismatch with respect to the CV dataset, we use all the CV data points (6000 in total). For better comprehension, Eqs (^{cv} and an ensemble of kernel parameters _{j,s} (_{e}) for a certain cluster _{s}(^{cv}) with respect to the GMM (cf. _{s}(^{cv}), and the predicted output _{s}(^{cv}). In this way, we are able to cross-validate the impacts of supervised learning within individual clusters. As reported in Panels (b), (d) and (f) of

Similar to

For visualization, we plot scale (left column) and weight (right column) parameters associated with different clusters separately.

Similar to Figs

In the experiment, the number _{cl} of clusters is 3, the same as the number of modes in the training inputs. Note that the learning process is carried out cluster by cluster.

The previous results indicate that, when the MMLS is adopted and the number of clusters is the same as the number of modes in the training inputs, one can improve the performance of predictions using the learned kernel parameters. Here, we also examine what will happen, when the MMLS is adopted, but the number of clusters is not necessarily the same as the number of modes in the training inputs.

_{cl} of clusters (e.g., 2, 4, 6 and 8) to fit the GMM using the same training inputs (with 3 modes) as in the previous experiment. Combining the results in Figs _{cl} is slightly larger than the number of modes (e.g., _{cl} = 6), then one might actually achieve better prediction accuracies over certain intervals, in comparison to the choice of _{cl} = 3. Of course, given a fixed number of training data, on average the number of training data per cluster will reduce as _{cl} increases. Therefore, if _{cl} becomes too large (e.g., _{cl} = 8), the prediction accuracies may be instead worsened as the number of training data within each cluster decreases. This insight will be useful for us to handle data assimilation problems in the presence of forward-simulator imperfection, yielding improved flexibility and assimilation performance, as will be shown in the next section.

Presented here are the results with respect to of the choices of using 2, 4, 6 and 8 clusters to fit the GMM (from top to bottom), respectively.

The preceding section indicates that, when combined with the MMLS, the ensemble-based kernel learning algorithm performs reasonably well in the presented SLP. As discussed previously, the idea of kernel-based functional approximation can also be extended to handle data assimilation problems with imperfect forward simulators.

Handling model imperfection is an important topic in many geophysical data assimilation problems. While there are already substantial efforts, e.g. [

As a proof-of-concept study, in what follows, we illustrate the performance of the integrated data assimilation framework, Eqs (^{3} + 1)^{1/2} to each gridblock of the reference model, and then adding 10% Gaussian white noise (relative to magnitudes) to the simulation outputs. As a result, in data assimilation, we have a set of observations distributing over the same gridblocks as in the reference model.

Top row: Reference model (Panel (a)) used to generate observations (cf.

Top row: Real observations (Panel (a)) generated using the reference model in

The reference model in

As mentioned earlier, to use kernel-based functional approximation in ^{cp}. In the experiments, we do not assume to have hard data to condition on. Instead, we construct ^{cp} and ^{o,cp} as follows. We set _{cp} = 200, and take _{l}, _{u}), where _{l} = _{min} − 0.1|_{min}| and _{u} = _{max} + 0.1|_{max}|, with _{min} and _{max} being the minimum and maximum values of the initial ensemble of model variables, respectively. To choose ^{o}. With this treatment, for each given ^{o}, and take ^{o} may not be “consistent”. This inconsistency, however, is partially taken into account by including

In the experiments, we consider two scenarios. In the first one, we study the case in which there is no imperfection in the forward simulator ^{3} + 1)^{1/2}. Our objective here is to inspect the impact of kernel-based model-error correction (MEC) mechanism on the performance of data assimilation, when there is no imperfection in the forward simulator, but MEC is still adopted. For reference later, we call this perfect (simulator) scenario (PS). In the second scenario, we investigate the case in which imperfection indeed exists in the forward simulator, with ^{2}. We examine how the performance of data assimilation may change in the presence of simulator imperfection. Likewise, we call this imperfect scenario (IS).

As a side remark, we note that the true (^{3} + 1)^{1/2}) and the imperfect (^{2}) forward simulators here are identical with those in the previous numerical example (cf Eqs (

In the PS, we conduct a comparison study involving two experiments. In one of them, no MEC is adopted since the forward simulator is known to be perfect. In the other, kernel-based MEC is introduced to data assimilation, even though the forward simulator is perfect (in many places, we will simply say MEC when there is no confusion). Except for this difference, the other settings in these two experiments are identical. We note that, in the relevant experiment, MEC is conducted by combining Eqs (_{cl} of cluster is set to 1 in the current experiments, as we know that the initial ensemble is generated used a Gaussian simulation method (hence unimodal). We will examine the impact of _{cl} on data assimilation in the IS.

In comparison to the real observations in

Results in Panels (a) and (c) correspond to the case without MEC adopted in data assimilation, whereas those in Panels (b) and (d) to the case with MEC. Unless otherwise stated, data mismatch in the experiment with MEC is always calculated using the modified forward simulator with a residual term, as in

Initial ensemble | Final ensemble (no MEC) | Final ensemble (with MEC) | |
---|---|---|---|

Data mismatch (mean ± STD) | 1.0694 ± 0.5361(×10^{7}) |
3.7326 ± 0.0223(×10^{4}) |
6.2645 ± 1.7551(×10^{4}) |

RMSE (mean ± STD) | 2.5240 ± 0.3070 | 1.0889 ± 0.0025 | 0.8836 ± 0.0133 |

In both experiments, the maximum iteration step of the iES is set to 10. In the experiment without MEC introduced, however, the iES stops at the iteration step 7, due to an alternative stopping criterion that is triggered to terminate the iES, when the average data mismatch is lower than four times the number of observations (which is 4 × 12, 000) for the first time. This early-stopping phenomenon indicates a higher risk of over-fitting observations, should the iteration process have continued after iteration step 7. On the other hand, in the experiment with MEC, since there are more parameters adopted in data assimilation, intuitively one might expect that the problem of over-fitting observations can be even more severe. Surprisingly, it turns out that over-fitting actually appears avoided, while the iteration stops at the maximum step. As a result, the final mean data-mismatch value in the experiment with MEC is higher than that in the experiment without MEC, as reported in

For quality check, in

One can also observe an interesting phenomenon by comparing the spreads of box plots in

As aforementioned, in SLP, typically one has many (matched) input-output pairs as the training data. In contrast, in data assimilation problems, we use a single realization (or one-shot) of the observations (at a given time instance and a given spatial location) to infer possible model variables. As a result, in SLP, one often has the luxury to split a dataset into two parts, one for training (and cross-validation) and one for test; whereas in data assimilation with MEC, this kind of luxury typically does not exist. This makes MEC a particularly challenging problem. Indeed, apart from the potential inconsistencies between the observations and the estimated model variables, there are only one-shot observations used for residual functional estimation, which makes it difficult for the updated forward simulator to generalize to other unseen training data (e.g., new input-output pairs), as our experiments indicate (results not shown).

Bearing the above challenges in mind, when evaluating the performance of MEC, we do not particularly focus on inspecting the generalization ability of the updated forward simulator (after all, the goal of data assimilation is to estimate the ground truth corresponding to real observations). Instead, we adopt the following cross-validation procedure, namely, for a given ensemble of model variables in the experiment with MEC, we compare the corresponding data mismatch values, when the residual term

Following this notion,

At a given iteration step, these differences are derived using data matching values that are calculated with the residual term excluded from

Based on the experiment results in the PS, we conclude that, in this particular case study, even though the forward simulator is perfect, it appears still beneficial to integrate kernel-based MEC into data assimilation for performance improvements.

In parallel to the results in the PS, we first report the results with _{cl} = 1 in the IS. In this case, we also compare the assimilation performance with respect to one experiment where there is no MEC introduced, and another experiment where kernel-based MEC is adopted, with the number of cluster _{cl} = 1. The initial ensemble of kernel parameters is generated in the way as in the case study of SLP.

Initial ensemble | Final ensemble (no MEC) | Final ensemble (with MEC) | |
---|---|---|---|

Data mismatch (mean ± STD) | 6.5372 ± 5.2423(×10^{7}) |
4.5211 ± 0.0590(×10^{5}) |
1.3248 ± 1.2528(×10^{5}) |

RMSE (mean ± STD) | 2.5240 ± 0.3070 | 1.2053 ± 0.0091 | 1.0696 ± 0.0174 |

The subsequent results in Figs

Overall, the experiment results presented here confirm again that, in this particular case study, kernel-based MEC helps improve the performance of data assimilation in the presence of imperfection in the forward simulator.

An alternative idea for MEC in data assimilation would be that, in

In the experiment, we also choose to integrate this bias-based MEC into ensemble-based data assimilation. To initialize an ensemble of biases, we first compute an ensemble of residuals between real observations and simulated observations with respect to the initial ensemble. We then calculate the mean and covariance of the residual, and use these statistics to draw an (initial) ensemble of biases, in a way similar to that we adopted to generate the initial ensemble of model variables. After that, similar to the setting in

^{6}) ± 80.1288, and the corresponding RMSEs are 1.9767 ± (2.7584 × 10^{−4}). Relative to the mean values, the tiny STDs of the final data mismatch and RMSEs suggest that ensemble collapse is a severe issue in the experiment with bias-based MEC.

The relative under-performance of bias-based MEC might be partially attributed to the simplifying assumptions, e.g., whiteness, stationarity, and normality [

As in SLP, when using kernel-based functional approximation for MEC, one can also choose to first group model variables into different clusters, and then estimate an ensemble of kernel parameters for each cluster. The final residual functional is taken as the weighted average of the individual (kernel-based) approximation functional estimated from each cluster, similar to the idea described in

_{cl} values. In the experiment, _{cl} takes its value from the set {1, 2, ⋯ , 10}. As one can see in _{cl} = 2, all other choices tend to result in lower RMSEs, in comparison to the choice of _{cl} = 1. This thus suggests that, similar to the results in SLP (cf. _{cl} that exceeds the actual number of mode(s) in the distribution of model variables. On the other hand, though, the optimal choice of the value of _{cl} remains to be an open problem in the current work.

This work focuses on addressing simulator imperfection in data assimilation from a perspective of functional approximation, which leads to an ensemble-based data assimilation framework that integrates functional approximation through a certain machine learning approach into an ensemble-based assimilation algorithm. For better comprehension of how such an integration can be established, we start from considering a class of supervised learning problems, and then discuss the similarity between supervised learning and variational data assimilation. This insight (of similarity) not only leads to an ensemble-based approach to solving supervised learning problems, but also sheds light on the development of an ensemble-based data assimilation framework that, in a natural way, merges machine learning and data assimilation methods to handle simulator imperfection. In the current work, we adopt a kernel-based learning approach to functional approximation. Nevertheless, as discussed in earlier texts, one may also employ other suitable machine learning methods for the purpose of functional approximation.

For performance demonstration, we first study a supervised learning problem. Through the investigations therein, we identify a challenge that may arise when using kernel-based ensemble learning in the presence of multi-modal training inputs. To overcome this problem, we consider a multi-modal learning strategy that helps achieve reasonably good results. Moreover, this multi-modal strategy can be transferred to the data assimilation problem later, also helping improve the performance of data assimilation. Apart from the multi-modal strategy, in the data assimilation problem, we also inspect the performance of the ensemble-based data assimilation framework with the integrated, kernel-based model-error correction (MEC) mechanism. The experiment results indicate that, in this particular case study, using kernel-based MEC tends to improve the data assimilation performance, no matter if simulator imperfection is present or not. In addition, the experiment results also show that kernel-based MEC tends to outperform an alternative, bias-based MEC mechanism.

As a proof-of-concept study, in the current work, we consider a relatively simple data assimilation problem, in which there is only one unknown parameter to estimate for each gridblock. Conceptually, based on Eqs (

Another line of future research will be to explore the integrations of the ensemble framework into deep learning models. Such integrations are expected to result in some ensemble deep learning methods that share certain implementational conveniences, e.g., derivative free (no back propagations) and fast implementation, as observed in various ensemble data assimilation methods in geosciences. Admittedly, however, it remains to be seen whether the resulting ensemble learning methods would be able to achieve other practical advantages, such as accuracy or robustness, in comparison to the more conventional ones.