Four-dimensional local ensemble transform Kalman filter: numerical experiments with a global circulation model

Abstract We present a four-dimensional ensemble Kalman filter (4D-LETKF) that approximately and efficiently solves a variational problem similar to that solved by 4D-VAR, and report numerical results with the Simplified-Parametrized primitive Equation Dynamics model, a simplified global atmospheric model. We discuss the relationship of 4D-LETKF to other ensemble Kalman filters and, in our simulations, compare it with two simpler approaches to assimilating asynchronous observations. We find that 4D-LETKF significantly improves on the approach of treating asynchronous observations as if they occur at the analysis time. For a sufficiently short analysis time interval, the approach of computing innovations from the background state at the observation times and treating those innovations as if they occur at the analysis time is comparable to 4D-LETKF, but for longer analysis intervals, we find that 4D-LETKF is superior to this approach.


Introduction
This paper presents an efficient method of implementing a fourdimensional ensemble Kalman filter, which we call the Four-Dimensional Local Ensemble Transform Kalman Filter (4D-LETKF) for assimilating asynchronous observations. Through a change of coordinates, three-dimensional LETKF  is equivalent to the Local Ensemble Kalman Filter (LEKF) of Ott et al. (2004), and locally its analysis is equivalent to the Ensemble Transform Kalman Filter (ETKF) of Bishop et al. (2001) and Wang et al. (2004). Unlike variational-based data assimilation schemes, ensemble Kalman filter (EnKF) schemes represent the forecast uncertainty with an ensemble of forecasts. Ensemble based data assimilation is a natural approach since numerical weather prediction centres, such as NCEP, Meteorological Service of Canada (MSC) and ECMWF, already employ ensemble forecasting operationally to assess the uncertainty in their forecasts. Using this information in the data assimilation procedure has the potential to provide better initial conditions, for both the main forecast and the ensemble forecast. The goal of an EnKF is to generate at regular time intervals an analysis ensemble. This analysis ensemble of model states should reflect both an estimate of the true atmospheric state (through its mean) and the uncertainty of this estimate (through its spread). If successful, then applying the forecast model to the analysis ensemble from one time yields a background ensemble at the next time. In this case, the background ensemble represents a probabilistic estimate of the atmospheric state before new observations are assimilated. The analysis cycle is completed by adjusting the background ensemble to better fit the new observations. In particular, the analysis ensemble mean is a weighted average of the background ensemble mean and the observations, with the weights determined from the background and observation uncertainties. More precisely, the analysis ensemble mean is the model state that best fits the given background and observation probability distributions.
In a Kalman filter, these distributions are assumed to be Gaussian and the uncertainties are thus characterized by covariance matrices. The background and observation covariances determine, via Bayes' rule, the analysis covariance. In ensemblebased schemes, the background covariance is computed as the sample covariance of the background ensemble, and thus to be consistent one must choose an analysis ensemble whose sample covariance matches the analysis covariance determined by the Kalman filter. Early versions of EnKF (Evensen, 1994;Burgers et al., 1998;Houtekamer and Mitchell, 1998)  stochastically, in the sense that they perturb the observations randomly and independently when generating each ensemble member. Another approach is to add the columns of a square root of the analysis error covariance matrix to the analysis ensemble mean. This approach is deterministic, provided that a specific choice of the matrix square root is specified, though many different choices for the square root are possible (Tippett et al., 2003). Examples of a deterministic EnKF include the Ensemble Adjustment Kalman Filter (EAKF) of Anderson (2001), ETKF of Bishop et al. (2001), Ensemble Square Root Filter of Whitaker and Hamill (2002) and LEKF of Ott et al. (2004).

do this
In an operational setting, the analyses are typically generated every 6 h, though many observations are available more frequently. Limited computational time is allowed for each analysis (less than 10 min at NCEP). Given such constraints, an efficient algorithm to assimilate asynchronous observations becomes important. One approach is the 4D-EnKF of Hunt et al. (2004). This four-dimensional extension of EnKF finds the analysis ensemble mean by fitting the linear combinations of the trajectories of the background ensemble to the asynchronous observations. This scheme may be thought of as an approximation to 4D-VAR (Fertig et al., 2007), the four-dimensional data assimilation technique used operationally by ECMWF (see for example, Le Dimet and Talagrand, 1986;Courtier et al., 1994;Rabier et al., 1998Rabier et al., , 2000. The main advantages of 4D-EnKF over 4D-VAR are that, as with other ensemble Kalman filters, it does not require computing the linear adjoint model for the (nonlinear) forecast model, and it propagates background covariance information from one analysis cycle to the next. Fertig et al. (2007) show that 4D-LETKF is comparable with 4D-VAR in a perfect model simulations with Lorenz-96 model (Lorenz, 1996). Here, we describe 4D-LETKF in more detail and present results of simulations with the Simplified-Parametrized primitive Equation Dynamics (SPEEDY) model (Molteni, 2003). In Section 2, we derive 4D-LETKF from the variational point of view, and describe how we apply the analysis locally and incorporate variance inflation. In Appendix A, we show that when the observation operator is linear, the analysis of 4D-LETKF is equivalent to that of 4D-EnKF  and, in case of synchronous observations, LEKF . For an ensemble of size k, LEKF and 4D-EnKF perform the analysis in a (k − 1)-dimensional space E, using an orthonormal basis consisting of eigenvectors of the background covariance matrix. Here, we show that performing the analysis in a k-dimensional space S with the background ensemble perturbations as the 'basis', each analysis becomes computationally more efficient. This choice of coordinates is equivalent to that used in the analysis of ETKF (Bishop et al., 2001). While our choice of matrix square root is different than in Bishop et al. (2001), it agrees with the square root used by Wang et al. (2004). A stepby-step pseudo-algorithm for implementing 4D-LETKF is given in Appendix B.
Simpler approaches for dealing with asynchronous observations are to treat the observations as if they occur at the analysis time Lindskog et al., 2001) or use First Guess at the Appropriate Time (FGAT) of Huang et al. (2002). FGAT computes the innovations, the differences between the observations and the model state, at the asynchronous times and treats these innovations as if they occur at the analysis time. Both approaches were implemented with 3D-VAR  in a High Resolution Limited Area Model (HIRLAM). In Section 3, we adapt these approaches to the LETKF formulation and compare both schemes with 4D-LETKF in perfect model simulations with the SPEEDY model (Molteni, 2003 numerical results show that 4D-LETKF produces better analyses than these two simpler approaches when the analysis time interval is sufficiently long, while the results are similar to FGAT approach over shorter analysis intervals. We conclude this paper with a short summary in Section 4.

Formulations
The goal of data assimilation is to estimate the true state x t n of a system, such as the atmosphere, at current time t n given noisy observations where H l and o l are the observation operator and observation error, respectively, at times {t l : l = 1, 2, . . . , n}, where t l < t l+1 . Typically, the true state x t n and its underlying dynamics are unknown. We assume that the evolution of x t n is modelled by a chaotic dynamical system: where M denotes a non-linear operator. In the derivations below, we assume that M is invertible. Indeed this is always the case if M is the evolution operator for a system of ordinary differential equations. However, M −1 may be difficult to compute in practice because integrating the model backward in time can be highly unstable, for example, in atmospheric models where the underlying system of partial differential equations is irreversible. We emphasize that we use M −1 only from a theoretical point of view below, and that the filter we develop in Section 2.1 does not require M −1 to be computed. From a probabilistic point of view, we want not just a specific estimate of x t n , but a probability distribution p(x n ) representing the likelihood that a particular model state x n is equal to x t n . Assuming that observations prior to times t 1 have yielded a background distribution p(x n ), the goal of data assimilation is to find the analysis probability distribution p(x n | y o 1 , y o 2 , . . . , y o n ) given observations {y o l : l = 1, . . . , n} in addition to the background information. The analysis state x a n is typically chosen to be the mode of this distribution, that is, the most likely state. If the distribution of observation errors is known, then applying Bayes Hereafter, we denote Kalman filters generally assume Gaussian background and observation error distributions: p(x n ) ∼ N (x b n , B) and p(y o |x n ) ∼ N (H (x), R), respectively. Here, components of x depend on x n through the relation x n−l = M −l x n and x b n is the background state at time t n , obtained by feeding the prior analysis state x a 0 at time t 0 into (2) and iterating. B is the background error covariance matrix and R is the observation error covariance matrix. We assume that the observation errors at different times {t l : l = 1, . . . , n} are uncorrelated, so the matrix R is block diagonal where each block is an Here, s l denotes number of observations at time t l . With these assumptions, maximizing (3) is equivalent to minimizing the cost function: with constraints defined by the dependence of x l at times {t l : l = 1, . . . , n − 1} on x n via the inverse of M. The cost function (4) is the same as for 4D-VAR, except that in 4D-VAR the background term and the vector x are expressed in term of model state x 1 at the beginning of the analysis time interval. (This distinction will become irrelevant for the approximate cost function we minimize in the next section.) Operational implementations of 4D-VAR (e.g. at ECMWF) generally use the Tellus 59A (2007), 5 same B for each analysis. However, the background uncertainty can vary considerably from time to time, so it is desirable to allow B to vary from one analysis to the next.
Kalman filters take the covariance from the analysis probability function p(x n | y o 1 , . . . , y o n ) and use it to determine the background covariance for the next analysis cycle. In an ensemble Kalman filter, an ensemble of initial conditions distributed according to the analysis probability distribution is propagated by the model to produce an ensemble of background states whose sample mean and covariance determine the background probability distribution for the next analysis. In the next section, we use this approach to derive our four-dimensional ensemble Kalman filter. Like other Kalman filters, we will make a linear approximation that makes the cost function quadratic; allowing us to minimize the cost function exactly. In this four-dimensional setting, our approximation also turns the constrained minimization problem described in this section into an unconstrained problem.

Filter derivation through variational approach
Ensemble Kalman Filters estimate the true state x t n with an ensemble whose mean represents an estimate of x t n and covariance represents the uncertainty in the estimate. Starting with a background ensemble {x b(i) n , i = 1, . . . , k}, the analysis step assimilates the observations to produce an analysis ensemble {x a(i) n , i = 1, . . . , k} (which is used to provide the background for the next analysis cycle, as described above). In the cost function (4), we replace the background state x b n with the sample meanx b n of the background ensemble, and the background error covariance matrix B with the sample covariance matrix where k is the number of ensemble members and is the m × k matrix of background ensemble perturbations. Note that this approximation is problematic for k < m since P b n is not a full rank matrix, and hence is not invertible. However, (P b n ) −1 is well defined on the 'ensemble subspace' spanned by the columns of X b n . Thus, an ensemble Kalman filter minimizes the cost function J over all model states x n such that x n −x b n is in this ensemble subspace, so that J is well defined. As we discussed earlier, computing the cost function J as a function of the model state x n at time t n would require using M −1 , which may be impractical. However, by storing the background ensemble {x b(i) l , i = 1, . . . , k} at the observation times t l as the model is integrated from the previous analysis time, we can approximate M −1 as follows. To each x n for which J is well defined, we will associate a corresponding model state x l in the space spanned by the background ensemble at time t l , using the same linear combination of the ensemble members as for x n . The background distribution of x l will have mean equal tō x b l , the sample mean of the ensemble at time t l , and covariance In this sense, at each time t l we will use the Gaussian background distribution associated with the ensemble at that time.
To make this approach more precise, we employ a precondition (or a coordinate change) by expressing the deviation of a state x l from the background mean statex b l as a linear com-bination of the background ensemble of perturbations, that is, where the weight w ∈ R k is to be determined. Here w does not depend on l, so the sequence x 1 , x 2 , . . . , x n is a linear combination of the ensemble trajectories. The resulting trajectory represents an approximate model trajectory, rather than an exact trajectory of (2) as described earlier. By making this approximation, we can consider the unconstrained problem of minimizing J as a function of w. Next, we approximate the observation vector corresponding to the model state x l at time t l by: Here the ith column vector of the s l × k matrix Y b l is the deviation of H l (x b(i) l ) from its ensemble average. That is, n , and using (5), (8) and (9), reduce the cost function (4) to: That is, we reduce the (n × m)-dimensional constrained minimization problem to a k-dimensional unconstrained minimization problem. Note that in the w coordinate system, the background error covariance matrix becomes the identity and hence we do not have to invert it. This choice of coordinates is analogous to a preconditioning step that often done in variational methods, whereby the cost function is expressed in terms of a vector whose background covariance is the identity matrix. Lorenc . For non-linear H l , these quantities are different.
(2003) suggests using cost function similar to (11), but without the linear approximation (9), to incorporate an ensemble-based background covariance into 3D-VAR or 4D-VAR, and Zupanski (2005) uses the same cost function in an ensemble filter. A similar cost function has also been used in hybrid ensemble/3D-VAR methods (Buehner, 2005;Wang et al., 2007).
The minimum of (11) is obtained by setting The solution of this equation is the analysis weight vector wherẽ The analysis error covariance matrixP a in the ensemble space is the inverse of the Hessian of the cost function (11) (Fisher and Courtier, 1995;Zupanski, 2005). The analysis state is obtained by substituting (12) into (8): In the case that H l is linear, eqs (13) and (14) are equivalent to the standard Kalman filter equations, which minimize J in closed form.
To complete the analysis, we generate an analysis ensemble of model states whose mean isx a and whose error covariance matrix in the model space is P a = X bPa (X b ) T . To satisfy these constraints, we update the ensemble using where W a(i) is the ith column of matrix W a = [(k − 1)P a ] As we will show in Appendix A, this four-dimensional filter is equivalent to 4D-EnKF of Hunt et al. (2004). The main difference is in the choice of coordinate (8). Next let us describe the localization and a way to do variance inflation.
2 In order that the mean of the analysis ensemble bex a n , we need that X a n 1 =X b n W a 1 to be zero, where 1 = (1, 1, . . . , 1) T . Since X b l 1 =0, it suffices that 1 be an eigenvector of W a . This is also true for the symmetric square root W a . In each analysis, the local region size is 3 × 3 grid points and the observations frequency is 6 h.

Localization
To perform ensemble data assimilation for a global atmospheric model with an ensemble of moderate size, some form of localization is necessary. As pointed out in Anderson (2001), Hamill et al. (2001) and Houtekamer and Mitchell (2001), the localization suppresses spurious long-range correlations produced by a limited ensemble size. On the other hand, it also improves the efficiency of the scheme because each local analysis involves much less data than a global analysis. Our localization is similar to that of Keppenne (2000) and Ott et al. (2004), in that the analysis is done separately and, if desired, in parallel for different local regions that cover the globe. In our formulation, the localization is relatively simple; for each grid point of the model, we choose a local subset of the global observations and apply the equations of Section 2.1 using only the local observations. To be more precise, see step (ii) of Appendix B.
In order to use the analysis ensemble members as initial conditions for the forecast model, it is essential that the results of the analysis be similar at nearby grid points. This can be ensured by choosing similar sets of observations for neighbouring grid points. As long as the observation sets overlap heavily, the analy-ses will be similar. In practice, we find that such heavy overlap is not always necessary. However, our choice of the matrix square root is important in achieving consistency of nearby local analysis; the symmetric square root ensures that W a depends continuously onP a . Indeed, with localization, other choices of the matrix square root can cause our filter to diverge (Harlim, 2006).

Variance inflation
In order to compensate for the tendency of a small ensemble to underestimate uncertainty, it may be desirable to artificially inflate the background error covariance matrix P b before each analysis. (Or, one could instead inflate the analysis error covariance matrix P a after each analysis.) A common approach is multiplicative variance inflation (Anderson and Anderson, 1999): multiply the background ensemble perturbations X b n by a constant factor √ 1 + r with r > 0, which effectively multiplies P b by (1 + r). A similar result can be achieved more efficiently by leaving X b n alone and replacing (13) bỹ Tellus 59A (2007), 5 Here (k − 1) I represents the inverse of the background covariance matrix in the w coordinate system, and in (16) we have multiplied this covariance matrix by (1 + r).

Relationship with other filters
Now, suppose that there are no observations at time {t l : l = 1, 2, . . . , n − 1} and that all observations at time t n are used (i.e. no localization is done). Equation (14) is reduced to the analysis mean update in a standard three-dimensional ensemble Kalman filter. The analysis ensemble update (15) is different from the ETKF of Bishop et al. (2001) because the choice of matrix square root is different. Our choice of the symmetric square root is the same as the Spherical-Simplex Ensemble Transform Kalman Filter (Wang et al., 2004). That is, our matrix W a is equivalent to Wang et al. (2004) transform matrix T T (eqs 5 and C1).
In the three-dimensional setting, the analysis used in the 4D-EnKF  is the same as that in LEKF . We show in Appendix A that LETKF is identical to LEKF except in the coordinate representations of each analysis; LETKF is more efficient since it requires no computations of the eigenvalues of the background error covariance matrix P b . In numerical experiments with the Lorenz-96 model (Lorenz, 1996), we are able to reproduce the results of Ott et al. (2004) with significant computational savings (Harlim, 2006).

SPEEDY model
In this paper, we apply our four-dimensional filter to a primitiveequation Global Circulation Model (GCM). This spectral model (nicknamed SPEEDY, for Simplified Parametrizations primitive-Equation Dynamics; see Molteni, 2003, for details) has 7 vertical levels (with sigma level 0.950, 0.835, 0.685, 0.510, 0.340, 0.200 and 0.080) and a horizontal resolution corresponding to a triangular spectral truncation at total wave number 30 (this yields 96 × 48 grid points in a standard Gaussian grid). There are five basic prognostic variables: vorticity, divergence, absolute temperature, specific humidity and logarithm of surface pressure. In addition to these variables, the model includes some diagnostic variables (such as saturation specific humidity, relative humidity, dry and moist static energy and saturation moist static energy) whose dynamics follow some simple models of physical processes (such as convection, large-scale condensation, clouds, short-wave and long-wave radiations and diffusion). With these simplified parametrizations, the model is designed to be (at least) an order of magnitude faster (in CPU time) than an operational GCM with similar horizontal resolutions.
The model dissipation and external forcing are determined by winter-time climatological fields sea-surface temperature, surface temperature and moisture in the top soil layer, snow depth, bare-surface albedo, fraction of the sea-ice, land-ice and land-surface covered by vegetation. The model forcing are updated daily with no diurnal variations. The prognostic variables are post-processed into zonal and meridional wind components (U-and V-wind), absolute temperature (T), specific humidity (Q), geopotential heights (Z) on pressure levels (925,850,700,500,300,200 and 100 hPa) and surface pressure (Ps).
Theoretically, the forecasts produced by SPEEDY model are less realistic than a more sophisticated model with higher resolution such as the NCEP GFS. However, this model serves our purpose in this paper since it is computationally inexpensive and it describes the atmospheric variability in the Northern Hemisphere during winter-time reasonably well.

Experimental design
In this paper, we compare 4D-LETKF to two approaches for handling asynchronous observations (implemented to LETKF). As we mentioned ealier, both approaches were formerly implemented in a 3D-VAR scheme.
The first approach Lindskog et al., 2001) is to treat all observations as if they occur at analysis time. In our variational formulation, this approach corresponds to replacing (11). In all of our experiments, we observe the same variables at each time t l , so H l = H n for all l. In this case, we replace Y b l by Y b n and the cost function (11) becomes Note that in the second term of (17), all model dependent variables are at time t n . Here after, we refer to this approach as 3D-LETKF where asynchronous observations are treated as observations at analysis time. The second approach is to treat each observation innovation (the difference between the observation and the background model state at observation time) as if it occurs at the analysis time. In our variational formulation, this corresponds to replacing x b(i) l by x b(i) n in (10) but keepingx b l in (11). In our scenario where H l = H n for all l, the cost function then becomes Huang et al. (2002) call this way of handling observations as the First Guess at the Appropriate Time (FGAT). Hereafter, we refer it as FGAT-LETKF. For each of the cost function (17) and (18), we perform the analysis using equations analogous to eqs (14)-(16) that we use for cost function (11). Specifically, for FGAT-LETKF we change Y b l to Y b n whenever it appears in these equations. For 3D-LETKF, Tellus 59A (2007), 5  (14). Thus, 3D-LETKF uses the background ensemble only at the analysis time, while both FGAT-LETKF and 4D-LETKF use the background ensemble states at each observation time, though FGAT-LETKF uses only the background mean at times t l < t n .
We compare these three schemes with a 12-h analysis cycle and a 24-h analysis cycle. A true trajectory is created by running the SPEEDY model for 2 months starting from NCEP reanalysis January 01 1982. Then, we generate simulated observations by adding a normally distributed noise with zero mean to the true states every 3 h for a 12-h analysis cycle and every 6 h for a 24-h analysis cycle. Here the observations errors for each variables are: 1 m s −1 for (both zonal, U-wind and meridional, V-wind) wind speed, 1 K for temperature, 0.0001 kg kg −1 for specific humidity, and 100 Pa for surface pressure, which are small compared to the model natural variability (standard deviation from its temporal mean without data assimilation). For pressure level 500 hPa, the natural variabilities are 6.78 m s −1 for U-wind, 6.84 m s −1 for V-wind, 2.92 K for temperature, 0.0005 kg kg −1 for specific humidity, 69.41 m for the geopotential height and 695.05 Pa for the surface pressure. At each (sigma) level, we observe about 22% of the grid points (1008 locations) with observations uniformly distributed between 75 • N and 75 • S.
In each data assimilation experiment, we use an ensemble of size 20. The initial ensemble consists of states from a long integration of the SPEEDY model at 20 randomly chosen times. For each analysis, we use observations from a two-dimensional local region of size 3 × 3 grid points; that is, we use all observations from the same vertical level (recall that there are seven vertical levels in the model) and up to one grid spacing away in both latitude and longitude. For the analysis at each grid point, the number of observation locations varies among 1, 2 and 4 in each local region. For the 12-h analysis cycle, we also add results assimilated with a two-dimensional local region of size 3 × 5 grid points (3 grid points in latitude and 5 in longitude), so that in each local analysis the number of observation locations varies among 2, 3, 4 and 6. Other local region sizes we tried yield similar or worse results in all the cases we show.
Note that for every pair of horizontally neighbouring grid points, their local regions share a set of observations that constitute all of the observations for one region but only half or 2/3 of the observations for the other region. Furthermore, the local Fig. 11. Average analysis errors of surface pressure (in Pa) for 24-h analysis cycle with 4D-LETKF (top) and FGAT-LETKF (bottom) during February 1982. In each analysis, the local region size is 3 × 3 grid points and the observations frequency is 6 h. regions do not overlap vertically. While this makes it possible to have incompatible analysis at neighbouring grid points, below we do not find the analysis fields to be imbalanced.
In the 12-h analysis cycle, we show simulations with variance inflation coefficient r = 20% in the case of 3 × 3 local region after comparing simulations with r = 5, 15, . . . , 30%. In the case of 3 × 5 local region simulations, we use variance inflation coefficient r = 30% after comparing simulations with r = 20, 25, 30, . . . , 40%. For the 24-h analysis cycle, we compared simulations with variance inflation coefficient r = 30, 35, . . . , 60% and show results with r = 40%. In each case, the chosen value of r yielded the best results for 4D-LETKF and FGAT-LETKF, and the results did not improve substantially with other values of r for 3D-LETKF.

Results
In the results below, we measure the quality of each analysis by calculating the root mean square (rms) difference between the true state and the analysis ensemble mean at analysis time t n . Here after, we refer to this quantity as the analysis error. The temporal average analysis error is calculated in the rms sense.
In Fig. 1, we show average analysis errors with a 12-h analysis cycle as functions of pressure (in hPa). We see that on average (during January and February 1982), the analysis errors of both 4D-LETKF (solid) and FGAT-LETKF (dashes) are very similar and significantly outperform 3D-LETKF (dash-dotted) on all shown variables (U-wind, temperature and geopotential height). In Fig. 2, we show for 4D-LETKF and FGAT-LETKF the geographical dependence of the average analysis temperature error for February 1982. While the FGAT-LETKF errors are smaller in some areas, the 4D-LETKF errors are notably better in the tropical regions where the errors are the largest (greater than 0.15). In Fig. 3, where the average analysis errors for surface pressure (in Pa) are shown, we see a similar pattern. That is, the errors of FGAT-LETKF are generally larger than those of 4D-LETKF in regions where both errors are large (greater than 25), mainly in the tropics but also near southern Greenland. In Figs. 4 and 5, we show the analysis errors of assimilations with local region of size 3 × 5. Note that the experiments with larger local region require more variance inflation (as noted earlier, r = 30% for this choice, and 20% for those assimilated with local region of size 3 × 3) because the ratio of the ensemble size and the number of model variables in each local analysis decreases as a function of local region size. Secondly, the larger the window of local region suggests more observations are included in each local analysis. Thus, the larger the uncertainties are reduced which implies that each ensemble members are more alike the other members. In fact, we observed a worse errors when the window size is larger.
Note also that there are significant small-scale fluctuations in the surface pressure analysis error in Figs. 3 and 5. This is an artefact of our observation network, which provides observations only at every other model grid point, so that as noted above the number of observations assimilated varies significantly from one grid point to the next. The analysis ensemble fields do not seem to be imbalanced, however. To check the balance, in Fig. 6, we show the average absolute value of the surface pressure tendency and the precipitation rate, both plotted as functions of forecast time. Here, both quantities are averaged over the globe excluding the polar regions above 75 • latitude. In Fig. 6, we choose three different times (January 16, February 1 and February 15) from our assimilation period and compute a 24-h forecasts of both quantities with initial conditions from the true state (solid), from the analysis with local region of size 3 × 3 (dashes) and from the analysis with local region of size 3 × 5 (dash-dotted line). In these results we see no sign of imbalance at the analysis time for either local region size.
Below, we will show results of a 24-h analysis interval with local region of size 3 × 3. First, we include results (Figs. 7 and 8) that show the effect of using less frequent observations (every 6 h rather than 3 h) in a 12-h analysis cycle. For these figures, we use variance inflation r = 20%. Comparing with Figs. 2 and 3, we see that using fewer observations increases the analysis errors significantly in the mid-latitudes but not in the tropics. We use the same observations (every 6 h) for the 24-h analysis interval results below. In Fig. 9, we see an identical representation of results as in Fig. 1, but all simulations are run with 24-h analysis cycle. We note that 4D-LETKF (solid), on average, yields a better analysis compared to FGAT-LETKF (dashes). For the 24-h analysis cycle, 3D-LETKF fails to bring the analysis errors to be at least comparable to observation errors even after cycling through 2 months of assimilations (results are not shown). Figures 10 and 11 show the average analysis errors of temperature at 500 hPa and surface pressure with 24-h analysis cycle during February 1982, respectively. These numerical results clearly suggest advantages of 4D-LETKF over FGAT-LETKF when the analysis time is longer.
To understand further the circumstances that cause 4D-LETKF and FGAT-LETKF to differ, consider (for the sake of exposition) a scenario where observations are available at a single time t l < t n . Then eq. (14) for the 4D-LETKF analysis mean becomes (substituting from eq. 16) x a n =x b n + X b n (k − 1)I/(1 + r ) If in addition H l is linear, H l (x) =H l x, then Y b l = H l X b l , and using the matrix identity In FGAT-LETKF we replace X b l by X b n , which affects the analysis in two ways. Changing X b n (X b l ) T to X b n (X b n ) T overestimates the cross-covariance between the background ensemble at the observation time t l and at the analysis time t n . Changing X b l (X b l ) T to X b n (X b n ) T ignores the change in the spread of the background ensemble between times t l and t n . Below we find in fact that the amount of change in ensemble spread over the analysis time window is a reasonable predictor of the size of the difference between the 4D-LETKF and FGAT-LETKF analysis error. Figures 12 and 13 show the geographical distribution of respectively the 12-and 24-h growth of the ensemble spread. We compute this growth factor as the ratio of the average amplitude Tellus 59A (2007), 5 of the ensemble perturbations X b n at analysis time t n and the average amplitude of X b 0 at the start t 0 of each analysis window, where each (rms) average is taken over February 1982. This averaged ratio simply reflects the growth in forecast uncertainties. For temperature field, the largest growth occurs mostly in the tropics after 12 h (Fig. 12). After 24 h, however, the largest growth is in the extratropics (see Fig. 13). In both figures, we see the largest growth for the surface pressure in similar regions as for temperature, but the amount of growth is much higher in the extratropics. These higher growth factors, especially for the temperature, are geographically correlated with those regions with larger analysis errors produced by FGAT-LETKF (compare Fig. 12 with Figs. 2 and 3 for the 12-h analysis cycle or Fig. 13 with Figs. 10 and 11 for 24-h analysis cycle). This result suggests that the regions where the FGAT-LETKF analysis is worst compared to 4D-LETKF are those regions with the most rapid growth in the ensemble spread.

Summary
In this paper, we describe a four-dimensional ensemble Kalman filter. In our derivations, we show how our four-dimensional filter differs from related approaches (Bishop et al., 2001;Hunt et al., 2004;Ott et al., 2004;Wang et al., 2004). In addition to the mathematical formulation, we also prescribe a pseudo-algorithm for practical implementation.
In Section 3, we showed some results, simulated on a relatively low resolution global weather model (SPEEDY model). In our simulations, we compared three different approaches for handling asynchronous observations with LETKF. We conclude that 4D-LETKF is in general a better approach than 3D-LETKF and FGAT-LETKF, though with a short enough analysis time interval (12 h in this case), FGAT-LETKF and 4D-LETKF yield comparable results. In this scenario, we also found that our results are not sensitive to the size of local regions used; however, the larger the local regions is used, the more variance inflation is needed. Even with a relatively small local region, we do not find any problem with the balance of the analysis fields produced by 4D-LETKF.
As we increase the analysis interval to 24 h, we see the advantage of 4D-LETKF over FGAT-LETKF. We conclude that this is due to the fact that FGAT-LETKF ignores changes both in size of the background covariance over the analysis time interval and in the correlation between the background ensembles at the analysis time and the observation times, while 4D-LETKF accounts for these changes. In particular, we found that 4D-LETKF outperformed FGAT-LETKF significantly in the regions where the ensemble spread grows the fastest.
In other scenarios, we expect one would see similar results; 3D-LETKF as a very crude approximation to 4D-LETKF, while FGAT-LETKF should provide a much better approximation. How long the analysis time must be to see a sig-nificant difference between 4D-LETKF and FGAT-LETKF will depend on the model and the time distribution of the observations. Within the LETKF framework, both methods have similar computational complexity, though FGAT-LETKF requires applying the observation operator only to the ensemble mean at the intermediate observation times whereas 4D-LETKF requires applying it to each of the ensemble members.

Acknowledgments
We thank Takemasa Miyoshi for his help with modifying the SPEEDY model code, provided by Franco Molteni, to allow for short-range forecasts; Elana J. Fertig for reading through the manuscript; Jun Jie Liu for her advice on coding issues; and both anonymous referees for their constructive critiscims. This research was supported by a James S. McDonnell Foundation 21st Century Research Award, NOAA/THORPEX (grant NA04OAR4310104) and the National Science Foundation (grants DMS 0104087 and ATM0434225).

Appendix A: Variational derivation of 4D-EnKF
In this section, we show that 4D-LETKF and 4D-EnkF  are equivalent for linear observation operators H l , which is assumed to be the case in the latter paper. Consider H l (x) = H l x where H l is an s l × m matrix. In this case, 4D-LETKF is a more efficient way to compute the same analysis as 4D-EnKF. To see this, let us derive 4D-EnKF like in Section 2.1. The fundamental difference between these two schemes is that in 4D-EnKF, (8) is replaced by: where U is an m × (k − 1) matrix whose columns are the eigenvectors of P b corresponding to nonzero eigenvalues. In matrix form where the diagonal component of the diagonal matrix Σ is the eigenvalue of P b corresponding to eigenvector u i (the ith column of U). Note that unlike w ∈ R k in (8), v ∈ R k−1 . The model state in the observation space, H n x n , at time t n is: whereĤ b n = H n U is an s n × (k − 1) matrix. At time t l , l = 1, . . . , n − 1, we need to represent model state x l as a function of state x n so that the asynchronous observation operator H l x l can be approximated in the coordinate system defined by (A1).
Recall that in each analysis, we readjust the background ensemble state {x b(i) n , i = 1, . . . , k} with eqs (13)-(15) to produce analysis ensemble state {x a(i) n , i = 1, . . . , k}, for which each analysis ensemble member lies in the space spanned by the (i) Form {x b(i)