Accounting for observation errors in image data assimilation

This paper deals with the assimilation of image-type data. Such kinds of data, such as satellite images, have good properties (dense coverage in space and time), but also one crucial problem for data assimilation: they are affected by spatially correlated errors. Classical approaches in data assimilation assume uncorrelated noise, because the proper description and numerical manipulation of non-diagonal error covariance matrices is complex. This paper proposes a simple way to provide observation error covariance matrices adapted to spatially correlated errors. This is done using various image transformations: multiscale (Wavelets, Fourier, Curvelets), gradients, and gradient orientations. These transformations are described and compared to classical approaches, such as pixel-to-pixel comparison and observation thinning. We provide simple yet effective covariance matrices for each of these transformations, which take into account the observation error correlations and improve the results. The effectiveness of the proposed approach is demonstrated on twin experiments performed on a 2-D shallow-water model.


Motivation
In the last 30 years, data assimilation techniques have become very popular in geophysics.They aim to combine in an optimal way, through priors on the involved errors, different kinds of information in order to provide an accurate estimate of the geophysical state.
Because of the complexity to describe accurately the dynamic of the ocean or the atmosphere, mathematical models are in practice idealised and simplified representations of the reality.Observations are then necessary to monitor the evolution of these geophysical states.Observations used in operational systems historically come from synoptic data.Those data are collected by stations, aircrafts, radiosounding, balloons, and drifters.The distribution of such observations is sparse and heterogeneous both in space and time.
Since the end of the 1970s, many satellites have been launched to improve our knowledge of the atmosphere and of the oceans.Geostationary satellites provide, among other data, photographic images of the earth system.Sequences of such images show the dynamical evolution of identified meteorological or oceanic 'objects': fronts, clouds, eddies, vortices and so on.The human vision can easily detect the dynamics in this kind of image sequence, and it clearly has a strong predictive potential.This aspect is favoured by the fact that image data, contrary to many other measurements, are dense in space and time.Indeed, the spatial resolution of current METEOSAT satellites is close to 1 km and they produce a full image every 15 min.This frequency will be improved up to one every 10 min (and even every 2.5 min for Europe only) for the upcoming third satellite generation.It implies a huge quantity of information which can be seen as an asset but also induces difficulties for the assimilation system that has to cope with such an amount of data.
Satellite data are currently used in an operational data assimilation system, mainly through the assimilation of the radiance measured by the satellite at each pixel of the image.
They are related to physical quantities such as surface temperature, sea surface height, cloud pressure, and chlorophyll concentration.However, studies such as Desroziers et al. (2009) have shown that, although radiances measured by satellite have an important impact on the assimilation, each radiance observation has a small impact compared to that of one in situ observation.This phenomenon is due to the fact that they are integrated measures.It is thus difficult to have an accurate representation of the observation errors, and moreover these errors are highly correlated.As a consequence, the prescribed error statistics associated to radiance measurements are artificially inflated in operational systems.
In practice only a tiny percentage (about 3Á5%) of total satellite (from polar orbiting and geostationary) data are used in operational numerical weather prediction (NWP) systems and given low confidence within the assimilation systems.Considering the cost of satellite observing systems (the cost of the launch of the Meteosat Third Generation is estimated at around 2.5 billion Euros) and of the infrastructures required for the collection of the data itself, improving their impact on forecasting systems is an important topic of research.
One important scientific issue related to satellite data is to study what information can, and should, be assimilated from images.In the literature, several methods dedicated to the assimilation of image sequences have been proposed.Two classes of approaches can be distinguished.The assimilation of pseudo-observation consists of first extracting dynamical information from images and then use the result as observation, whereas the direct observation relies on a single assimilation of the image data.

Pseudo-observations
The radiance measured at each pixel is relevant information, but it does not give any detail on the structures, such as the fronts of geophysical entities, that can be observed in images.When looking at a satellite image sequence, the human eye notices the evolution of structures, through the deformation of isophote lines.Structure information is even available when looking at individual images, as illustrated by the different satellite observations of the ocean presented in Fig. 1.
In NWP, this information on the dynamic is currently assimilated through so-called atmospheric motion vectors (AMVs).Their aim is to estimate the motion of some identified structures from one image to another using correlation techniques (Lucas and Kanade, 1981).The resulting vector field is then assimilated as wind data.More complex strategies have also been designed by Michel (2011) in order to characterise specific clouds, track their motion along the image sequence and assimilate their Lagrangian trajectories.
Due to their really indirect nature and the complexity of the pre-and post-processing [a thorough description of these processes can be found in Schmetz et al. (1993) and Nieman et al. (1997)], describing the errors associated to such sparse wind data is not straightforward.In particular, they are correlated, so that complex observation error covariance matrices have to be built or the errors have to be significantly inflated, and therefore it will reduce their impact.Bormann et al. (2003) found statistically significant spatial error correlations on scales up to about 800 km, that are moreover strongly anisotropic.
Generally, AMVs are thought to be very useful.However, as shown by Cardinali (2009), such observations can in some cases have a negative impact on assimilation.
Dense motion estimation has also been considered for assimilating the image dynamics.The motion between two consecutive images can indeed be computed through image processing techniques based on dense optical flow estimation (Horn and Schunck, 1981).The resulting 2-D vector fields can therefore be associated to the velocity at sea surface or cloud altitude and assimilated as pseudo-observation, as proposed by Korotaev et al. (2008) and Papadakis and Me´min (2007).
Other pseudo-observation methods based on image gradients assimilation have been proposed.Gradients contain pertinent information which has been successfully used for front tracking in oceanographic image sequences (Ba and Fablet, 2010) and assimilation of wave directions (Aouf et al., 2006).
The pseudo-observation approach nevertheless suffers from the difficulty to model the observation errors.Indeed, the error due to the image processing technique itself cannot generally be quantified accurately.For instance, the dense optical flow methods that estimate a 2-D velocity from scalar images involve an ill-posed problem and an artificial regularisation of the vector field is required.Such regularisation introduces errors in the estimation which are difficult to model.As a consequence, when dealing with pseudoobservations, it is hard to distinguish image processing errors from the ones inherent to the satellite acquisition process.

Direct comparison
A direct comparison between observations and model is also possible.For instance, the radiance observation are directly assimilated (Ko¨pken et al., 2004).This is made possible thanks to the use of a specific observation operator named RTTOV [see, for instance, Matricardi et al. (2004)] which enable to compare directly the model variable to the observation.
In this work, we used a data assimilation method aiming at getting information on velocity field thanks to images observations.For this kind of assimilation, a direct comparison between image temporal variation and state velocity has been studied by Papadakis and Me´min (2008), through the introduction of a non-linear observation equation based on the optical flow constraint equation.This overcomes the problem of artificial regularisation by only considering the photometric information.Another approach has also been developed by Titaud et al. (2010) where observed images are compared directly to a synthetic image derived from the model state variable.
More recently, another approach using image gradients has been proposed by Titaud et al. (2011) and continued by Gaultier et al. (2013).The idea here consists of extracting a feature map from the dynamic that can be compared with a gradient map computed on images.The feature map is composed of Lyapunov exponents or vectors (D'Ovidio et al., 2009) obtained through an integration of the position of particles added to the model.More precisely, the computation consists of perturbing a particle position and measuring the direction and the amplitude of the deviation after a finite integration time.In oceanography, this information is pertinent since strong Lyapunov coefficients are corre-lated with chlorophyll discontinuities at the surface, that can be observed from satellite images.The process nevertheless relies on a binarisation of both Lyapunov information and image gradients, which makes its integration in data assimilation systems difficult, as it involves nondifferentiable processes and degrades the information.

Specification of observation errors
The specification of observation errors statistics is an important topic in data assimilation.Indeed, they define the weight of each observation and the way the analysis can differ from the observation set (through the error covariances).
Estimating the error statistic of an observation set is a mathematically difficult problem and has seldom been considered thoroughly in data assimilation.Desroziers et al. (2005) introduced a method in order to diagnose the consistency of background and observation error variances and cross-correlations under the hypothesis that the assimilation scheme relies on statistical linear estimation theory.On top of these estimation difficulties, introducing the information about cross-correlation in the assimilation scheme is a complex task due to the size of the problem and the need to invert the obtained covariance matrix.To circumvent this complexity, Stewart et al. (2013) proposed to use Markov matrices in order to model correlations.The inverse of such matrices are tridiagonal and need the storage of only one parameter.They show that, on 1-D synthetic test cases, using information on correlation (or even approximation of this information) leads to better results than considering the observation errors uncorrelated.
The specification of observation error for image-type observation is a challenge.Whatever the approach used (pseudo-observation or direct comparison) the usual assumption of uncorrelated errors between observations is clearly invalid.Additionally, the size of the data is likely to make the handling of the error covariance matrices cumbersome.The main topic of this paper is to propose and compare new approaches to specify non-trivial observation error correlations in a computationally efficient way.

Proposed approach and overview of the paper
This paper compares different observation transformations dedicated to image-type observation.Two of them have already been proposed in the literature: the classical oneto-one pixel comparison [see, for instance, Ko¨pken et al. (2004)] and the comparison in a Curvelet space (Titaud et al., 2010) extended in this paper to the comparison in an orthonormal Wavelet and Fourier spaces.Two new transformations based on image gradients and gradient orientations are also proposed.Their relative merits are discussed with a special attention on the description of the corresponding error statistics specification.In particular, when the observation noise is correlated in space, the usual diagonal approximation for the observation error covariance matrix is not sufficient for the pixel-to-pixel comparison, while the use of other transformations allow to continue using this approximation.
Some basic notations on variational data assimilation are first recalled in Section 2. Observation transformations are then defined in Section 3. The specification of observation error statistics for each transformation is discussed in Section 4 and non-trivial error covariance matrices for the assimilation are proposed.Finally Section 5 presents the experimental framework and some numerical results.

Variational data assimilation
Variational data assimilation is a technique based on optimal control theory which seeks to estimate initial or boundary conditions of a given system.This is done through the minimisation of a cost function accounting for the observations and model equations of this system.
Let V be the state space identified to its dual defined over X & R n (e.g.V 0H 1 (V)).The evolution of the state variable X # W(t 0 ,t f ) 0{fjf # L 2 (t 0 ;t f ;V)} is assumed to be described through a (possibly non-linear) differential dynamical model M : V7 !V: where X 0 # V is a control parameter.We then consider noisy observations Y # O of the state variables, where O & R m is the observation space.These observations may belong to a different space from the state variable.We will nevertheless assume that there exists a differentiable operator H : V7 !O, that maps the variable space to the observation space.The control problem states as follows: find the control parameter X 0 which best fits to observations over the time range [t 0 ,t f ].To do so, we form and minimise the following cost function: where R is the observation error covariance matrix and the norm : for all u # O.It is classically known that this problem is illposed.Therefore, a regularisation term, involving a socalled background state, is added to the cost function: where B is the background error covariance matrix.In this paper, it is based on the generalised diffusion equation as proposed by Weaver and Courtier (2001).
The minimum of J is sought using a gradient descent algorithm and its gradient is generally computed using an adjoint method as advocated by Le Dimet and Talagrand (1986).

Direct assimilation of image
A general framework needed to directly assimilate images is described in Section 3.1.Section 3.2.1 and 3.2.2recall the pixel comparison and the multiscale transformation.Then Sections 3.2.3 and 3.2.4introduce new transformations based on image gradients.

General framework
Let y o be an image observation of a direct or indirect output variable of the mathematical model.
The functional to minimise reads: where X denotes the unknown state initial conditions that we want to estimate, including, for instance, the initial state velocity w 0 .A represents a transformation n T that can be applied on the image and its model counterpart, and R A the associated observation error covariance matrix.

Observation transformations
This section describes transformations A used to compare the images and their model equivalents.

Pixel.
A first idea consists of defining A as the identity function, so that the image comparison is performed in the pixel space.The functional to minimise is: where R pix stands for the covariance matrix in this space.
3.2.2.Multiscale.In order to model the spatial information, an idea is to consider a multiscale representation of an image.Titaud et al. (2010) proposed to use the curvelet representation defined in Candes et al. (2006).The very same process can be done using other multiscale representations.
In this paper, we propose to use also Fourier transform and orthonormal periodic wavelet transform.This choice is due to the fact that Fourier and wavelet bases involve faster transforms and are easier to work with.The Fourier, wavelet and curvelet transforms used in this paper come from MCAlab toolbox 1 (see Fadili et al., 2010).
These transforms rely on the fact that an image y o can be decomposed as: By o ; u j;k;l > u j;k;l where {8 j,k,l } j,k,l are the elements of the curvelet tight frame, the orthonormal wavelet or the Fourier bases.The parameters j,k,l, specify the scale, the position and the orientation of each element.These coefficients are then compared with their synthetic counterpart.Hence, the functional to minimise reads: where A m stands for the multiscale transform and R m for the associated error covariance matrix.Notice that we have a Parseval equality for the multiscale transforms, so that the norm 3.2.3.Gradient.Considering that the pertinent information on the flow dynamics is localised on image singularities, we also propose to compare 2-D gradients r ¼ ½@ x @ y T of the image and its model counterpart.Hence, the observation A 9 transformation is defined as A 9 (y o ) 09y o .
As a result the cost function becomes: where R 9 is the covariance matrix relative to the observation errors A 9 .
The corresponding tangent linear transformation can be obtained for a small perturbation g around q as: @ q A r ðcÞ ¼ rc: The adjoint observation transformations applied to a vector u reads: There exist an infinity of different images that share the same gradient map.Indeed, the family

Angular.
A last transformation can be introduced to focus on structure orientations.The idea is to compare the normalised 2-D gradients r ¼ ½@ x @ y T of the image and its model counterpart, by measuring their angular difference.We then measure the misfit as with the L 2 norm: In order to avoid an ill-posed problem for null vectors, we rather define the misfit as: where x 2 X t 2 ½t 0 ; t f : As a consequence, this leads to the following inequality when 9I is not null: is either null or almost equal to the unitary vector ry o ry o k k . Following the notations of Section 2, the observation transformation is now defined as . The corresponding functional reads: where R Ang is the covariance matrix relative to the observations errors.As the observation transformation A Ang is non-linear, we need to compute the tangent linear transformation @ H X ð Þ A Ang .Denoting q 0H(X), it can be obtained for a small perturbation g around H(X) as: The adjoint observation transformation applied to a vector u reads: The right-hand side of the adjoint model is finally obtained by taking u Once again there exist an infinite number of different images y o sharing the same gradient directions, such as the family y o a;c ¼ ay o þ c, 8a 2 R and c 2 R. As a consequence, the angular transformation is not able to correct the intensity of the image and will only focus on correcting the position of structures.
The eigenvalues of the symmetric matrix kis not zero, most of the information will be sent to r ?H X ð Þ, the orthogonal direction of the current state gradient direction.On the other hand, this transformation will just realise an isotropic diffusion if 9H(X) is a null vector.
According to the current values of 9(H(X)) k and 9y o , the minimisation of ( 10) produces the following effect on (H(X) in the areas where 9y o is null, diffuse the orientation of the observation orientation 9y o where ry o k k> 0 and 9(H(X) where 9y o and 9(H(X)) k are non-null.
These properties illustrates how the structure orientation of images is transferred to the model variable.

Error description
When dealing with real satellite data, noise can corrupt the signal from the earth's surface or the atmosphere.This noise is due to measurement uncertainties as well as perturbations involved by the vertical integration of the signal (as the atmosphere can contain clouds, rain, snow, etc.).The induced errors must be characterised in order to perform an optimal analysis.
The measurement norm (3) involved in the observation cost function (2) requires the inverse of the observation error covariance matrix R. As these matrices can be huge (and even non-invertible) one needs to make some approximations.
In this part we focus on building approximate covariance matrices easily invertible in selected spaces.We would like to underline that in a Fourier, wavelet, curvelet, gradient and angular space a diagonal covariance matrix can hold information on error correlation in the pixel space (this is discussed in more detail in Section 4.3 for the wavelet case).This property has already been used in Fourier space to model homogeneous correlation in the background error covariance matrix [for instance, in Courtier et al. (1998)] and in wavelet spaces to model heterogeneous correlations [for instance, in Fisher (2003); Deckmyn and Berre (2005); Pannekoucke (2009)].

Notations
In the following, the true error covariance matrices are denoted C noise Space depending on the noise type and the considered observation Space.
Let us first consider that the available images I o correspond to the true state I * corrupted with an additive time independent error h.The observation equation reads, in the pixel space, I o 0I * 'h.Assuming that the error covariance matrix, C noise Pix , is known, the error covariance matrix associated to a linear transform A of the observed image is: where A can be a multiscale transformation described in Section 3.2.2 or the gradient transformation from Section 3.2.3.As these matrices can be huge (and even non-invertible) they must be approximated R where Id n is the n )n identity matrix.For instance, when considering an independent and identically distributed (iid) noise, the matrix R iid r;Block is a block diagonal approximation of the true error covariance matrix C iid r for the gradient space 9. Throughout this paper, we use these denominations (scalar, diagonal, block diagonal) to refer to the above approximations, in particular 'scalar' means 'proportional to identity'.

Independent additive noise
Assuming that images are corrupted by a Gaussian white noise independent and identically distributed in space and time, let us describe the error covariance matrices built in this case for the various image comparisons introduced in Sections 3.2.1,3.2.2,3.2.3 and 3.2.4.
Pixel case: The true error covariance matrix C iid Pix is scalar.As a consequence, there is no need to approximate this matrix: where the variance s 2 monitors the noise amplitude.

Fourier case:
Here again, the true covariance matrix C iid Fou is scalar.As a consequence, there is no need to approximate this matrix: where s 2 monitors the noise amplitude.

Wavelet case:
The error covariance matrix in a wavelet space is: where W is the wavelet transform defined in Section 3.2.2.As only orthonormal wavelets transforms are used in our study, we have: Then, whatever the chosen wavelet basis, provided that it is orthonormal, the error covariance matrix is Curvelet case: The curvelet transform by wrapping (denoted C), defined in Candes et al. (2006), is a redundant (and therefore noninvertible) transform.A pseudo-inverse of the curvelet transform by wrapping is the adjoint of the forward transform (C T ).It enables a perfect reconstruction of the original image given the full set of curvelet coefficients.
The error covariance matrix in a curvelet space is nevertheless not invertible as it reads As a consequence, we do not try to explicitly construct it.
However, one can see that a sufficient condition to build a cost function in the curvelet space equal to the cost function in the pixel space is to choose R iid C;Scalar ¼ r 2 Id k .This comes from the fact that C T is the pseudo-inverse of C meaning that C T C0Id k : Remark 2. Keeping all of the coefficient in a multiscale family leads in this case to minimise exactly the same cost function as in the pixel case.Indeed, from eq. ( 7) and the fact that the considered adjoints are equal to their pseudoinverses, we have: As a consequence there will be no result referring to multiscale decompositions when the observation error is uncorrelated in space.

Gradient case:
The transformation 9 is not invertible.Neither is the error covariance matrix C iid r ¼ r 2 rr T .In Section 5, we use two different approximations of this matrix: A scalar matrix corresponding to the diagonal of the true error covariance matrix: In this study, as central differences are used to compute partial derivatives, the variance reads: A block diagonal matrix, built from C iid r by making the hypothesis that there is no interaction between the error made on the x-and yderivatives.The construction and inversion of this matrix is detailed in Appendix A.

Angular case:
As the angular distance defined in Section 3.2.4 is nonlinear, it is difficult to build statistics for this case.Indeed, the formal computation of the mean and the covariances of the noise error are out of reach.Therefore, we consider that observations are only reliable when ry o k k4r.Indeed, in this case the noise does not have much impact on the gradient direction.Therefore, we design the variance of both components of ry o ði; jÞ= ry o ði; jÞ k k E to be almost equal to s 2 for ry o k k4r, and very large for ry o k k5r (so that the observation is almost discarded).This writes as follows: The covariance matrix used in twin experiments of Section 5 is thus: where we consider isotropic correlations through

Correlated Gaussian white noise
Let now assume that h is an additive Gaussian noise spatially correlated.Ideally, the observation error matrices should contain off-diagonal elements to represent these correlations.However, as mentioned before, the size of such matrices for realistic application is, most of the time, far too large to be handled properly.The aim of this section is to propose computationally efficient approximations of the observation error covariances.We do so by combining the transformations A with suitably chosen diagonal matrices R.

Pixel case:
In these experiments, the noise distribution is chosen such that each pixel variance is the same.Therefore, in the pixel space the approximation is scalar and reads: Wavelet case: A naive approach would be to build the error covariance matrix in wavelet space from the diagonal assumption in the pixel space given by eq. ( 24).Doing this leads to define the approximation of the covariance matrix in any orthonormal wavelet subspace as: As presented for the uncorrelated case in remark 2, the cost function in the pixel space involving R cor Pix;Scalar matrix is equal to the one in the wavelet space involving R cor W;Scalar .Therefore, in this work, no results will be presented with this approximation.
One can also build an approximation covariance matrix from the true covariance matrix in a wavelet space C cor W .The latter can be constructed from C cor Pix using eq.( 13).In Vannucci and Corradi (1999), the authors provide an algorithm to do so for a 1-D wavelet basis.We developed a 2-D version of this algorithm to perform the experiments presented in Section 5.4.
In this study, we approximate the true error covariance matrix C cor W by its diagonal: Such a process is applied with the Haar basis and the Daubechies basis with eight vanishing moments.In the following, their covariances matrix are denoted as: Remark 3. As the transformation A m 0W is bijective, the information used by the assimilation process is the same as in the pixel comparison.Therefore, the only difference between ( 6) and ( 7) lies in the error statistics description through R cor D8;Diag or R cor D8;Diag , coupled with the transformations W0D8 or W 0Haar.
Curvelet and Fourier case: In these cases, we do not build the true covariance matrix C cor Curv and C cor Fou .We rather estimate the diagonal of each matrix (variances) from r01000 independent noise realisations.
The variance of a given (Fourier or curvelet) coefficient y is estimated by: varðyÞ Gradient case: When the noise is identically distributed and the correlations are isotropic, the variance of each gradient component is the same.Therefore, taking into account information exclusively about variances leads to work with the scalar matrix: Note that, as central differences are used to compute partial derivatives, the variance reads: where Note that when p is close to 1, r2 is far smaller than s 2 , so the noise in gradient space is smaller than that of pixel space.

Angular case:
As it is very complex to specify the effect of a correlated noise for this transformation, we simply use the covariance matrix built for uncorrelated noise:

R cor
Ang;Diag ¼ R iid Ang;Diag : Let us describe what kind of correlations in the pixel space are taken into account in the cost function with the proposed diagonal matrices in wavelet, curvelet and Fourier spaces.To do so, we build a cost function in the pixel space (6) equivalent to the one in a multiscale space (7) by creating a new covariance matrix named R cor Pix;Eq .Indeed, eq. ( 7) reads: This equation is equal to: where Therefore, the covariance matrix R cor Pix;Eq used in the pixel space enables to build a cost function equal to the one involving the transform A m in association with the diagonal matrix R cor A m ;Diag .Thanks to this equivalent matrix in the pixel space R cor Pix;Eq it is easy to visualise correlation: Fig. 2 exhibits error correlation between a pixel and its neighbourhood at nine different locations.
The top left panel presents the true error correlations extracted from C cor Pix .The correlations are isotropic and the same for each pixel far from the boundary.The other panels present the equivalent correlations in the pixel space induced by a diagonal matrix in the studied spaces, using R cor Pix;Eq .The correlations modelled using the Fourier, curvelet or Daubechies transforms look similar to the true ones, despite being built using diagonal matrices.
However, one can notice that: the isotropy and homogeneity of the noise is not represented in Daubechies Wavelet basis, some spurious correlations do appear in the Curvelet and Daubechies spaces, unlike the true correlation map, the correlation map induced by periodic transforms (Fourier, Daubechies Wavelet, Curvelet) is periodic.
The impact of accounting for those correlations along the assimilation process is studied in Section 5.4.Remark 4. The same comparison is difficult to set up with gradient and angular transform because they induce a loss of information.Therefore, no trivial comparison between the correlations taken into account for pixel gradients/angular can be made.

Twin experiments and robustness to noise
The aim of this section is to investigate the performance of the proposed sparse representation of observation errors statistics (combined with the proposed transformations) depending on noise characteristics.First the experimental framework is presented, and the transformations are validated through a noise-free experiment.Note that from now on, we will use the word 'transformation' also for the pixel comparison, for which the transformation is simply A 0Id n .Then three kinds of observation noise are studied: an uncorrelated and two different spatially correlated Gaussian noises.The relative merits of the studied observation transformations A combined with their associated R A matrix approximations are discussed.

Experimental framework and dynamical model
The experimental framework mimics the drift of a vortex on a turntable.The evolution of a vortex in the atmosphere is simulated at the CORIOLIS experimental Fig. 2.
Spatial correlations around nine selected points (topÁleft) and their representation with a diagonal approximation of the R matrix combined with various image transformations.turntable2 (Grenoble, France) which re-creates the effect of the Coriolis force on a thin layer of water.A complete rotation of the tank takes 60 seconds which corresponds to one Earth rotation.The vortex is created by stirring the water and made visible thanks to the addition of a passive tracer (fluorescein).The camera is placed above the turntable, and photographs of the vortex constitute the observed image sequence.For more details about these experiments, see Flo´r and Eames (2002).
5.1.1.Numerical configuration.In this configuration, the evolution of the fluid can be represented by the shallow-water equations involving the horizontal velocity wðx; tÞ ¼ ðuðx; tÞ; vðx; tÞÞ, where u and v are the zonal and meridional components of the velocity, and the water elevation hðx; tÞ.These unknown variables are defined on the spatial domain X 3 xand the time interval ½t 0 ; t f 3 t.Such a model reads: The relative vorticity is denoted by f ¼ @ x v À @ y u and the Bernouilli potential by , where g * is the reduced gravity.The Coriolis parameter on the b-plane is given by f ¼ f 0 þ by, k is the diffusion coefficient and r the bottom friction coefficient.The following numerical values were used for the experiments: r 09.10 (7 , k 00, f 0 00.25, g 09.81 and b 00.0406. The simulation is performed on a rectangular domain [0,L] )[0,H] representing a sub-domain of the turntable with L 0H 02.525 m.The domain is discretised on a 128)128 uniform Arakawa C-type square grid.A finite differences scheme is used for space discretisation.Time integration is performed using a fourth order Runge-Kutta scheme.The time step of the turnable experiment is set to 0.01s which would correspond to 14.4s in the atmosphere.

Observation operator.
The vortex temporal evolution is shown through the fluorescein concentration evolution.This evolution is observed by an image sequence whose grey levels correspond to the concentration of a passive tracer q transported by the velocity field.
Denoting w the velocity (computed by the model M) transporting the passive tracer and n T the diffusion coefficient, we have Assuming that the initial concentration of q is known at time t 0 , the dynamic of q on the time interval [t 0 ,t f ] is defined by the model ( 37), where the diffusion coefficient is n T ¼ 10 À5 .In the following experiments the considered image observation sequence y o represents observations of q.As a consequence, the observation operator reads: where q(t i ) comes from (37).Remark 5.In those experiments, we assume that the initial concentration of the passive tracer is known.Therefore, we do not control q 0 .5.1.3.Twin experiments configuration.In order to focus on the methodological aspects we will use a so-called twin experiment framework.In this classical approach, synthetic observations are created thanks to a model simulation from a known 'true state'; then an assimilation experiment is performed starting from another 'background' state using the synthetic observations.The result of this analysis can be compared with the synthetic truth.
In our case, considering initial conditions at time t 0 of the state variables (u Ã 0 , v Ã 0 and h Ã 0 ) and fluorescein concentration q Ã 0 , a simulation of the dynamical model is made from t 0 00s to t f 06s.This defines a scenario consistent with realistic experiments, as illustrated in Fig. 3.
The simulated values of the concentration q Ã ðt i Þ, at various observation times t i 2t 0 ; t f ½, are then used as observations.One such observation is taken every 0.25s, so that we get 25 observed images per assimilation window.
In the experiments, a variational data assimilation algorithm, allowing the estimation of the state variable (u,v,h), is applied.The background variables u b ðt 0 ; xÞ, v b ðt 0 ; xÞ are initialized at rest while h b ðt 0 ; xÞ ¼ h mean 8x is initialized as a constant.The minimisation of the cost function is performed thanks to N1QN3 solver (see Gilbert and Lemare´chal, 1989). Notations: The true state variables and observations are denoted with the superscript *, the background state (without assimilation) with b, and the analysis (the estimated state after assimilation) with a.

Metrics for qualitative analysis.
Here we introduce some tools in order to measure different criteria on the estimated velocity fields.We mainly focus on the estimation errors for the velocities u and v.The vorticity error and the angular error are also looked at, as it gives an excellent criteria to evaluate the quality of the estimated velocities directions.
In the following, we refer to the Root Mean Square Error (RMSE) between the variables u (either u b for the background without assimilation or u a for the analysis with assimilation) and the true synthetic ones u* at time t 0 : Similarly, we define the RMSE for v and the vorticity f ¼ v x À u y , as well as the angle a defined as: where U E ðxÞ¼½uðt 0 ;xÞ;vðt 0 ;xÞ;E, U Ã E ðxÞ¼½u Ã ðt 0 ; xÞ;v Ã ðt 0 ;xÞ; E and This corresponds to a modification of the angular error of Barron et al. (1994) as proposed in Souopgui (2010).
In the following, we mostly consider the ratio RMSE(u a )/ RMSE(u b ).This ratio represents the residual error of the analysed zonal velocity u a with respect to the error of the background zonal velocity u b .A ratio smaller than 1 means that the assimilation has improved the corresponding variable with respect to the background.Remark 6.In our experiments, as the background velocities are initialised with null values, this ratios reads: Therefore, the square of this number represents the percentage of noise (in terms of energy) present in the analysed field.

Validation with perfect data
We want to compare the behaviour of gradients (see Section 3.2.3)and angular (see Section 3.2.4) to pixel, curvelet and wavelet observation transformations (see Section 3.2) in the case of perfect data (i.e.without any noise).
As we started with a static background, we have no confidence in it.Therefore, we introduce a weight w b in the cost function: Choosing w b that is small ensures that the J b term acts only as a regularisation term, and not as a strong feedback to the background.
Figure 4 shows the evolution of the RMSE ratio of the zonal velocity u along the iterations for pixel, gradients and angular.For each distance notion, this ratio decreases with the iterations.The minimisation leads to a coherent analysed field in each case.The best zonal velocities field is reconstructed with the angular observation transformation.
RMSE ratios obtained at the end of the minimisation processes are presented in Table 1.
As illustrated by Fig. 4, most of the initial velocity error is corrected throughout data assimilation.Indeed, for the velocities components (u and v), the noise energy of the analysed field represent less than 2.5% of the true field energy.One can notice that the angular observation transformation leads to slightly better results than other observation transformations for all criteria.Therefore, the loss of information induced by the angular transformation is not a problem in a perfect data context.
In this 'perfect observation' case, each observation transformation gives satisfying results.In particular, it validates the proposed gradient and angular transformations.
In realistic applications, there is no such thing as perfect observations.The next two sections study the impact of data noise on the assimilation process.In Section 5.3, we perform experiments with a Gaussian spaceÁ and timeÁ uncorrelated noise, and then in Section 5.4 the impact of spatial correlation is discussed.For each kind of noise, different noise levels are tested.For each level, 10 noise scenarios are generated in order to perform 10 independent experiments.
In order to quantify the reliability of the information available in our image sequences, we refer to the Signal to Noise Ratio (SNR) which is computed as: which represents the logarithm of the ratio between Image Energy and Noise Energy, where y* represents the true synthetic image and y o the noisy image.A large SNR (% 30 À 40dB) means that the image sequence has a good quality whereas a smaller (or negative) SNR means that the image sequence is strongly corrupted by noise.Figure 5 presents examples of noisy images used in our experiments.

Uncorrelated additive Gaussian white noise
We first consider images corrupted by an additive Gaussian white noise uncorrelated in space and time.The noise is identically distributed with mean 0 and variance s 2 .We test the robustness of our transformations to different noise scenarios, considering r 2 0; 0:1 ½ (whereas y Ã 2 0:13; 0:86 ½ ).Table 2 presents the ratio between analysed state RMSE and background state RMSE for several noise perturbations.One can see that the angular transformation is not well suited to deal with independent Gaussian white noise.Indeed, even with images sequences of good quality, the error on the analysed field remains important.This can be explained by the fact that the normalised gradient field holds less information than other distances.This is particularly true in flat areas where the information is not reliable, as taken into account by the covariance matrix modelling (21Á23).On top of that, the angular covariance matrix modelling suffers from the fact that no correlations are considered.All those drawbacks could explain why the result obtained with this distance are worse than the results obtained with the other distances (whereas they were the best with perfect data).
Results obtained with the block covariance matrix, R iid r;Block of eq. ( 20), or the scalar matrix, R iid r;Scalar of eq. ( 18) are very different.In this context, introducing some information on the correlation through the covariance matrix leads to a better analysed field for all high noise levels.
To summarise, in case of additive uncorrelated Gaussian white noise, all the observation transformations are sensitive to the noise amplitude.In this particular case, we cannot conclude that working in a space is better than working in another one.Indeed, results obtained for the pixel or even gradients transformations (coupled with adequate covariance matrices) are very close to each other.In this case, the use of a diagonal R matrix in the pixel space is not an approximation; therefore, it is difficult to improve the results by using different transformations.However, this  This ratio is computed after assimilation starting from a background at rest.does not hold when the noise is correlated in space, as studied in the next section.

Correlated Gaussian noise: homogeneous isotropic case
Let us now consider an additive Gaussian noise spatially correlated.The correlation is modelled by: where b is an independent and identically distributed Gaussian white noise of variance s, ? is the convolution product symbol and G is a Gaussian filter of size (2n'1)(2n'1) such that for i; j ¼ Àn; :::; n: where jGj ¼ P The correlations are isotropic and homogeneous, which means that the correlation model is exactly the same for each pixel.In our study, the isotropic correlation length of eq. ( 43) is parameterised with s L 01.5. Figure 2 shows the true correlation and the correlation modelled with a diagonal approximation of the error covariance matrix in several spaces.
We performed various experiments, combining observation transformations, covariance matrix approximations, data thinning or compressing.To clarify the results presentation, we sorted them into two classes: bijective transformations (Pixels, Fourier, Wavelet and Curvelet3 ), with diagonal covariance matrices approximations (Section 5.4.1);lossy transformations (thinning, Gradient, Angular, truncated multiscale analysis), with various covariance matrices approximations (Section 5.4.2).

Bijective transformations.
In this paragraph the whole dataset is used, possibly after a change of variable (Pixels, Fourier, Wavelet, Curvelet).
Table 3 presents the RMSE ratio of the zonal component of the velocity with respect to the SNR.
The following remarks can be stated: (1) We can first see that the pixel distance combined with a diagonal matrix performs poorly, even with a small noise level, and even more so with a large one.
In that case, ignoring the error correlations is severely detrimental to the analysis.(2) On the contrary, using a diagonal error covariance matrix in combination with a change of space reintroduces information about the noise correlation, allowing for much better results and an increased robustness to the noise level, as we can see with the curvelets, Daubechies wavelets, and Fourier.(3) The Haar wavelets performance is disappointing.
Indeed, we saw previously (see Fig. 2) that they induced poorly shaped (non-isotropic) correlations, contrary to the other changes of basis.So using wavelets is not a sufficient condition to ensure the recovery of correlations, the choice of the wavelet basis should be made according to the observation noise correlation pattern.(4) Fourier performs slightly better than the others; this is not surprising since Fourier representation is well suited to represent isotropic correlation.

Lossy transformations.
A common way to deal with correlated observation error is to get rid of the correlated part of the signal.This is usually done by data thinning Liu and Rabier (2002).The idea is to discard observations if they are too close, in order to make sure that the remaining observations are uncorrelated, so that the diagonal approximation may be valid in the pixel space.
In this paragraph, we investigate various ways of accounting for correlated observation errors through lossy transformations.In addition to data thinning, we also present the gradient and the angular representations of the signal.Similarly, these transformations removes some part of the signal, but as it has been shown previously [see eq. ( 31)], it also filters the noise, provided the spatial correlation is strong enough.Finally, we can use another way to transform the information: first we use a change of variable into wavelets or curvelets, and then we discard selectively some parts of the signal.This operation, called thresholding in the image processing community, consists of keeping only the largest coefficients of the wavelets/curvelets decomposition, and corresponds to a projection on a subspace.Strictly speaking, the aim of such an operation is not to reduce the correlation in the observation, but to compress the data.
Table 4 presents the final RMSE ratio of the zonal velocity errors (as in Table 3) for those experiments.
As before, we can draw several comments from these results: (1) Thinning slightly improves the results when we keep half of the pixels in each direction, meaning that indeed the undersampled observations fit the uncorrelated hypothesis better.When the thinning is too strong however [e.g. for (1/3) 2 thinning], it deteriorates the analysis.This can be explained by the scale of our signal: the vortex width is 20 pixels, so that the image resolution is not that high in comparison.Therefore, thinning also discards valuable information, degrading the results as a consequence.
(2) The gradient transformation provides a better way of extracting information than pixels (while still being outperformed by wavelets/curvelets/Fourier, as shown above).
(3) The angular transformation is disappointing in this setting.It is likely that too much information is discarded by this transformation.Moreover, the large flat area featuring in this image sequence does not provide exploitable angular information.(4) Multiscale analysis with thresholding leads to improved results, compared with the other lossy transformations.However, when we compare these to Table 3 we can see that thresholding has a small negative impact on the analysis, particularly for the Daubechies wavelets.This approach may still be useful in case the dataset is too large to be handled efficiently by the data assimilation system.
5.4.3.Rate of convergence.Figure 6 shows the behaviour of the cost function (left) during the minimisation and the associated analysis RMS error (right) for three selected experiments (Pix(Scalar, D 8 (Diag and C(Diag).
The convergence rate is higher in the pixel space than in a wavelet or curvelet space (left panel).However, the minimum reached at convergence is better in the wavelet or curvelet space (right panel).
For those experiments, the RMSE decrease rate is the same in the pixel space and wavelet space.For C(Diag, incorporating information on error correlation leads to a degradation of the RMSE decrease rate.This may be due to the difficulty to properly estimate the diagonal elements of R.
For real applications this can be a problem as it means that if the assimilation process is stopped too soon, using curvelet signal representation can be worse than using the pixel representation.
5.4.4Error analysis.Figure 7 presents the analysis error on v 0 for different observation transformations.The results are presented for the scenario involving the worst noise level studied in a correlated case (see Fig. 5 and Sections 5.4.1 and 5.4.2).This confirms the good performance of Fourier, Curvelet, Daubechies.One can add the following remarks: Fourier, Curvelet, Daubechies analysis errors are mostly located near the vortex area.Indeed, the analysed velocity field is too smooth to represent correctly the singularities around the vortex.
Pixel, gradient and angular analysis errors are not limited to the vortex area.This suggests that it is hard to distinguish the error from the signal when poor information about the error structure is provided.
The analysis error when using Haar basis is important on the whole domain.This is another illustration of the (relative) shortcoming of the diagonal hypothesis in this basis for this kind of noise.5.4.5.Summary.In this section we showed that it is possible to account for homogeneous and isotropic observation noise using various data transformations coupled with diagonal approximations of the error correlation matrices.
Indeed the diagonal matrix associated to the Fourier (resp.Curvelet or Daubechies Wavelet) implicitly takes into account the noise spatial correlations, without requiring computationally expensive matrix inversions.
Simpler solutions have also been presented.Among those solutions, considering the signal in a gradient space seems to be an easier yet valid approach.

Correlated Gaussian white noise: an anisotropic and inhomogeneous case
Multiscale transformations coupled with non-constant diagonal error correlation matrix approximations gave good results for isotropic correlated noise.The Curvelet transform did not bring improvement over Fourier and Wavelet, despite being more complex and expensive.This is not surprising since a homogeneous and isotropic noise can be represented by a diagonal matrix in Fourier space.Therefore, on the previous case, nothing can be gained using other transform since Fourier representation (associate to a diagonal matrix) is optimal.However, curvelets provide a flexible way of dealing with more generic noise structures.In this section, we consider anisotropic and inhomogeneous Gaussian noise.Figure 8 presents an example of noise realisation, as well as its impact on a tracer observation.The main direction as well as the length scale of the noise correlation function differ from one pixel to another.
Table 5 presents the RMSE ratios at the end of the analysis process.As for the isotropic case, it is better to  take into account an approximation of the error correlation and whatever the noise level is.Indeed, when the signal is considered in the Daubechies Wavelet basis, in the Fourier space or in the Curvelet space the error at the end of the assimilation is considerably reduced (compared to pixel case).
However, unlike in the isotropic case, Curvelet representation outperforms Wavelet and Fourier.Indeed, each atom of the curvelet family has three characteristics: localisation, scale, orientation.The latter allows for better anisotropic representation.

Conclusion
This paper first presents two original observation transformations dedicated to image-type observation assimilation.Both are based on the gradient of the image, in norm and orientation, respectively.They showed competitive performances on an academic test case compared to previously introduced metrics and therefore are good candidates for more realistic applications.In the future, it could be interesting to study combinations of such metrics in order to benefit from their respective merits.
The second and main part of this paper highlights the importance of a good specification of the observation error covariance matrix in data assimilation.Indeed, this is a crucial point for image-type data, but is likely to be of significant importance for other kinds of data.This was illustrated thanks to the study of two kinds of observation noise.First an uncorrelated additive Gaussian noise, for which all operators (classical and new) show similar robustness (except for the angular operator which suffers from the variance specification) to the noise level, and none of them seem any better or worse.This is a favourable case that is seldom encountered in practice.More realistically, a second set of experiments is performed applying a correlated additive Gaussian noise to the data.In that case, it appears clearly that it is important to improve the observation error covariance matrix specification and avoid the use of scalar ones (i.e.R0s 2 Id), as it is commonly done in practice.Interestingly, such a process can be done at a cheap cost by combining a variable transformation with a non-constant diagonal approximation for the covariance matrix.Encouraging results are obtained with Fourier and wavelet transforms for the isotropic noise and with curvelet transform in the anisotropic case.It is also pointed out that the specification of the covariance matrices should obviously be related to the actual observation noise and that it is not just a matter of using a non-diagonal R.
This last remark means that a careful study of the observation error has to be done prior to assimilation.In this paper, we assumed here that R is known in the pixels space, but in practice, how could we adapt this work to cases where R should be modelled from scratch?On that point, a wide literature exists in different fields such as images and statistics which aims to estimate and/or model the variance terms in a different space, we can for example mention the interesting work of Nguyen van Yen et al. (2012) which iteratively estimate the wavelet variance scale by scale.In variational data assimilation, the balance between the background and observation error matrices B and R is a subtle alchemy, improving the R description should allow for a more consistent B specification.
The solutions proposed in this paper are dedicated to image observation or more generally to dense observations that can be considered to be on a given grid.It is natural to extend the proposed solution for errors correlated in time (image sequences, for instance), using 3-D transformations (2-D'time).It can also be extended to sparser and randomly spaced data but the definition of the appropriate transformation becomes difficult and will change from one observation time to the next.
Another crucial point to sort out prior to operational use, specific to image-type observation, is to address the problem of occultation (a cloud passing by over ocean surface data, for instance) and its impact on error statistics.Again, multiscale description, or combination of metrics presented in this paper, may offer a suitable framework for dealing with such problem.Some preliminary results can be found in Chabot (2014).In this case an anisotropic and inhomogeneous noise (Section 5.5, see Fig. 8) is applied to the observations.The variable of interest is u.

Appendix A
Covariance matrix for gradient additive noise A1 Block approximation Recalling that the observation operators reads: A possible discrete implementation of the gradient rg ¼ ½g x ; g y T at pixel (i,j) with i ¼ 1 Á Á Á L and j ¼ 1 Á Á Á H, is defined as: Even if the noise is a white Gaussian additive noise, there are some correlation between the gradient components.Indeed, when h x (i,j) is inside the domain boundary, it is correlated with h x (i'2,j), h x (i(2,j), h x (i (1,j(1), h x (i'1,j (1), h x (i (1,j'1), h x (i,j (1).
As illustrated in Fig. 9 the covariance coefficients are: cov g x ði; jÞ; g y ði À 1; j þ 1Þ ¼cov g x ði; jÞ; g y ði þ 1; j À 1Þ ¼ r 2 4 cov g x ði; jÞ; g x ði; j þ 2Þ ð Þ ¼ cov g x ði; jÞ; g x ði; j À 2Þ ð Þ ¼ cov g x ði; jÞ; g y ði À 1; j À 1Þ ¼ cov g x ði; jÞ; which leads to a non-invertible R matrix written as: R ¼ C g x g x C g x g y C g y g x C g y g y where C g x g x is the error covariance matrix between the xÁ derivative elements.Simplifying this covariance matrix by neglecting the correlation between xÁ and yÁ derivative elements leads to consider that C g x ;g y ¼ 0. This simplification is illustrated on Fig. 10.One consequence is that one element h x (resp.h y ) is only correlated to elements of the same row (resp.column).This approximation leads to consider the following block matrix e R: Fig. 9. Covariance between h x (i,j) (in red) and other elements (in green and blue) in a gradient space for an independent and identically distributed Gaussian white noise in the pixel space.The blue (resp.the green) points represents xÁ (resp.yÁ) derivative elements correlated to h x (i,j).All the covariances between h x (i,j) and other points are represented.Its inverse exists and reads as:

Fig. 1 .
Fig. 1.Different examples of satellite images.(a) Altimetric reconstruction from JASON satellite data.(b) Ocean colour/Chlorophyll is from the MODIS captor of ENVISAT satellite.(c) Sea surface Temperature from the MODIS captor of ENVISAT satellite.
R cor D8;Diag for the Daubechies basis (27) R cor Haar;Diag for the Haar basis (28)

Fig. 3 .Fig. 4 .
Fig. 3. Initial values of the scenario chosen for the twin experiments.The zonal and meridional velocities u Ã 0 and v Ã 0 are characterised by a strong vortex, illustrated by the initial vorticity f 0 .The height h Ã 0 is assumed almost flat.The velocities take their values from (0.03 ms (1 (blue) to 0.04 ms (1 (red) and the 2-D vorticity from (0.2 s (1 (blue) to 0.64 s (1 (red).This synthetic initialisation has been created in order to match correctly the initialisation of the passive tracer, q Ã 0 .

Fig. 5 .
Fig. 5. First image of the sequence used for the assimilation, for various noise levels.The tracer concentrations vary from 0 (blue) to 1 (red).Top left: Perfect data scenario.Top right: Worst scenario tested for additive correlated Gaussian white noise (SNR 014.8 dB).Bottom left: Worst scenario for Gaussian white noise (SNR 06.7 dB).Bottom right: Best scenario for an additive correlated white noise (SNR 026.8 dB).

Fig. 6 .
Fig. 6.Mean (over 10 experiments) cost function evolution with respect to 4-DÁVar iterations (left).Mean (over 10 experiments) evolution of the RMSE ratio of v component of the velocity with respect to 4-DÁVar iterations (right).Convergence rates are plotted for an isotropic noise.The noise magnitude is the smallest considered in the various experiments (26.8 dB).

Fig. 7 .
Fig. 7. Error analysis on the vÁcomponent of the velocity using the worst correlated image sequence studied.The velocity errors range from (0.0075 ms (1 (blue) to 0.0075 ms (1 .The true velocity field (see Fig.3) ranges from (0.025 ms (1 to 0.0405 ms (1 .
corresponding to the covariance between xÁ derivative elements of the first row.A2 Inversion of e R Albeit the boundary elements, a block C g i¼k x g i¼k x , of the matrix e R can be separated in two blocks depending on j parity: Taking into account that all blocks are identical we only need to invert four tridiagonal matrix C g odd x

Fig. 10 .
Fig. 10.Illustration of the simplification of the error covariance matrix.Only correlations between derivatives in the same direction are considered.

Table 1 .
Ratio between analysed state RMS and background state RMS for perfect data

Table 2 .
Ratio between analysed state RMSE and background state RMSE, for the independent additive noise scenarios(Section 5.3)

Table 4 .
Ratio between analysed state RMSE and background state RMSE for data with spatially correlated additive noise of decreasing standard deviation The variable of interest is u.Lossy image transformations are applied in this case (Section 5.4.2).Thinning(2) means that, in each direction, every other pixel is used.Thinning(3) means that, in each direction, every third pixel is used.

Table 5 .
Ratio between analysed state RMSE and background state RMSE for data with spatially correlated additive noise of decreasing standard deviation