^{1}

^{*}

^{2}

^{1}

^{3}

The authors hereby state that currently (author Marco Congedo) is an Academic Editor of PLOS ONE. This does not alter the authors' adherence to PLOS ONE Editorial policies and criteria.

Conceived and designed the experiments: MC BA AB MM. Performed the experiments: MC AB. Analyzed the data: MC AB. Contributed reagents/materials/analysis tools: MC AB. Wrote the paper: MC BA AB MM.

We explore the connection between two problems that have arisen independently in the signal processing and related fields: the estimation of the geometric mean of a set of symmetric positive definite (SPD) matrices and their approximate joint diagonalization (AJD). Today there is a considerable interest in estimating the geometric mean of a SPD matrix set in the manifold of SPD matrices endowed with the Fisher information metric. The resulting mean has several important invariance properties and has proven very useful in diverse engineering applications such as biomedical and image data processing. While for two SPD matrices the mean has an algebraic closed form solution, for a set of more than two SPD matrices it can only be estimated by iterative algorithms. However, none of the existing iterative algorithms feature at the same time fast convergence, low computational complexity per iteration and guarantee of convergence. For this reason, recently other definitions of geometric mean based on symmetric divergence measures, such as the Bhattacharyya divergence, have been considered. The resulting means, although possibly useful in practice, do not satisfy all desirable invariance properties. In this paper we consider geometric means of covariance matrices estimated on high-dimensional time-series, assuming that the data is generated according to an instantaneous mixing model, which is very common in signal processing. We show that in these circumstances we can approximate the Fisher information geometric mean by employing an efficient AJD algorithm. Our approximation is in general much closer to the Fisher information geometric mean as compared to its competitors and verifies many invariance properties. Furthermore, convergence is guaranteed, the computational complexity is low and the convergence rate is quadratic. The accuracy of this new geometric mean approximation is demonstrated by means of simulations.

The study of distance measures between symmetric positive definite (SPD) matrices and the definition of the center of mass for a number of them has recently grown very fast, driven by practical problems in radar data processing, image processing, computer vision, shape analysis, medical imaging (especially diffusion MRI and Brain-Computer Interface), sensor networks, elasticity, mechanics, numerical analysis and machine learning (e.g., [

The set of SPD matrices with a given dimension forms a smooth manifold, which we name the

In this article we introduce a new approximation to the FI mean springing from the study of the relation between the geometric mean of a set of SPD matrices and its approximate joint diagonalization [

In the following sections we introduce the notation and nomenclature. Then we review some concepts from Riemannian geometry and relevant metrics used to define a geometric mean. Then we establish the connection between the geometric mean of a SPD matrix set and their approximate joint diagonalizer. We introduce our approximation and we study its properties. In the result section we illustrate the accuracy of our approximation by means of simulations. Finally, we briefly discuss an on-line implementation and we conclude.

In the following we will indicate matrices by upper case italic characters (_{n}_{n} will always stand short for ^{T}, and ‖·‖_{F} the trace of a matrix, its determinant, its transpose and its Frobenius norm, respectively. The operator diag(·) returns the diagonal part of its matrix argument. The identity matrix is denoted by _{1}, …, _{K}} or shortly as {_{k}}. An asymmetric divergence from SPD matrix _{2} to SPD matrix _{1} will be denoted as _{1}←_{2}), whereas a symmetric distance or divergence between two SPD matrices will be denoted _{1}↔_{2}). The lambda symbol, as in _{n}(^{th} eigenvalue of matrix ^{2}_{n}(_{n}(^{2}. We will make extensive use of symmetric functions of eigenvalues for SDP matrices. For a symmetric matrix ^{T}, where ^{−1} = ^{−1}^{T}, symmetric square root ^{½} = ^{½}^{T}, symmetric square root inverse ^{−½} = ^{−½}^{T}, logarithm ln(^{T} and exponential exp(^{T}.

In many engineering applications we are confronted with multivariate observations obeying a linear instantaneous mixture generative model. For example, in electroencephalography (EEG) we observe N time-series of scalp electric potentials, typically sampled a few hundreds of times per second. Let us denote by ^{N} the multivariate vector holding the observed data, in our example, the electric potentials recorded at the N scalp electrodes at time ^{P}, P≤N, holds the time series of the P cerebral sources to be estimated, (^{N} is a noise (model error) term, assumed uncorrelated to ^{NxP} is the full column rank time-invariant _{k}∈ℜ^{PxP} here represent SOS matrices of the unknown P source processes and _{k} are diagonal), it can then be shown that the approximate joint diagonalization (AJD) of a set of K matrices generated under theoretical model (

The _{1} and _{2} is a matrix _{1} and _{2} are diagonal matrices. The JD is not unique, since if _{1} and _{2} so is _{1} and _{2} commute in multiplication ([^{−1} and _{1}, _{2} be appropriate SOS matrices estimated from data

In general, for a set of K>2 SPD matrices {_{1},…,_{K}} there does not exist a matrix diagonalizing all of them. However, one may seek a matrix diagonalizing the set as much as possible, that is, we seek a matrix _{k}^{T}, with _{1},…,_{K}} so is _{K}(

where the _{k} are optional non-negative real numbers weighting the diagonalization effort with respect to the input matrices _{k}. Besides being specific to SPD matrices, criterion (

Criterion (_{k}, thus the AJD matrix _{1}_{1},…,_{K}_{K}} is the same as the AJD matrix of {_{1},…,_{K}} for any positive set {_{1},…,_{K}}.

The proof is trivial using well-known properties of the determinant and of the logarithm and will be omitted here. For AJD solutions according to a given criterion, we make use of the following:

The AJD of set {_{1},…,_{K}} according to some AJD criterion is _{1} and _{2} of the criterion are in relation _{1} = _{2}, where

For details on the essential uniqueness of AJD see [_{k}, with

For any invertible matrix _{1}^{T},…,_{K}^{T}}, then _{1},…,_{K}}, where invertible diagonal matrix

Saying that _{1}^{T},…,_{K}^{T}} according to some AJD criterion implies that the set {_{1}^{T}^{T},…, _{K}^{T}^{T}} is a global minimizer of the AJD criterion employed. Thus, matrix _{1},…,_{K}}.

Finally, we will need the following:

Let _{1},…,_{K}}; an AJD criterion is said to verify the self-duality invariance if ^{T} is a well defined AJD of set {_{1}^{-1},…,_{K}^{-1}} satisfying the same criterion, with invertible diagonal matrix

Geometrically, the Euclidean space of SPD matrices of dimension N x N can be considered as a ½N(N+1)-dimensional hyper cone (_{F}. We will replace the convex pointed cone in the vector space of _{Ω}M defined at each point _{Ω}M at point _{1} and _{2} in the tangent space the inner product through point

This topology is easily visualized in case of 2x2 matrices; any 2x2 covariance matrix can be seen as a point in 3D Euclidean space, with two coordinates given by the two variances (diagonal elements) and the third coordinate given by the covariance (either one of the off-diagonal element). By construction a covariance matrix must stay within the cone boundaries. As soon as the point touches the boundary of the cone, the matrix is no more positive definite.

Consider a point _{Ω}

The Fisher information metric allows us to measure the length of curves in M and find the shortest curve between two points

The exponential and logarithmic maps are shown graphically in _{Ω}M to the point _{Ω}(

The inverse operation is the function mapping the geodesic relying _{Ω}M. It is named the _{Ω}(

Given two points _{1} and _{2} on the manifold _{1},…,_{N} of either matrix (

The Riemannian norm is zero only for the identity matrix, while the Frobenius norm is zero only for the null matrix. Either an eigenvalue smaller or greater than 1 increases the norm and the norm goes to infinity as any eigenvalues go to either infinity or zero. Importantly, because of the square of the log, an eigenvalue

The semiaxes are proportional to the square root of the eigenvalues of the covariance matrix. If we ask how far the ellipsoid is from the circle, which is the definition of the norm (

(13) Postivity _{R}( |

(14) Symmetry _{R}(_{R}( |

(15) Congruence-Invariance _{R}(_{R}(^{T} ↔ ^{T}), for any invertible |

(16) Similarity-Invariance _{R}(_{R}(^{−1}^{−1} |

(17) Invariance under Inversion _{R}(_{R}(^{−1} ↔ ^{−1}) |

(18) Proportionality _{R}(_{β}(_{R}( |

(19) _{R}(_{β}(_{β}(_{R}( |

(20) _{R}(_{F},with equality iff |

Given a set of SPD matrices {_{1},…,_{K}}, in analogy with the arithmetic mean of random variables, a straightforward definition of the matrix arithmetic mean is

On the other hand a straightforward definition of the geometric mean is far from obvious because the matrices in the set in general do not commute in multiplication. Researchers have postulated a number of desirable properties a mean should possess. Ten such properties are known in the literature as the ALM properties, from the seminal paper in [_{1},…,_{k}} of K SPD matrices _{k} such as the matrix satisfying [^{2}(⋅↔⋅) is an appropriate squared distance. In words, the Riemannian geometric mean is the matrix minimizing the sum of the squared distances of all elements of the set to itself. Using the Fisher information (FI) distance (

Properties of the geometric Mean |
---|

(25) Invariance by Reordering: |

(26) Congruence Invariance: |

(27) Self-Duality: |

(28) Joint Homogeneity: _{k} ≥ 0 |

(29) Determinant Identity: |

(30) if all matrices _{k} pair-wise commute then |

(31) |

Given two points _{1} and _{2} on the manifold M, the Geometric Mean of them, indicated in the literature by _{1}#_{2}, has several equivalent closed-form expressions, such as [

In the above the indices 1 and 2 can be switched to obtain as many more expressions. The geometric mean of two SPD matrices is indeed the midpoint of the geodesic in (

Our investigation has started with the following:

The FI geometric mean of two SPD matrices can be expressed uniquely in terms of their joint diagonalizer; let _{1}, _{2}} such that _{1}^{T} = _{1} and _{2}^{T} = _{2} and let ^{−1}, then the geometric mean is

Since diagonal matrices commute in multiplication, using (_{1}#_{2})^{T} = _{1}#_{2} = (_{1}_{2})^{½}. Conjugating both sides by ^{−1} we obtain (

A consequence of the scaling indeterminacy of the JD and (

Given a set {_{1},…,_{K}} = {_{k}} of K_{R}{_{k}} satisfying _{R}{_{k}}, we initialize

Consider two points _{1} and _{2} on M and construct the tangent space _{Ω}

Initialize M

Set ε equal to a suitable machine floating point precision number (e.,g., 10^{–9} for double precision),

υ = 1, τ equal to the highest real number of the machine.

Repeat

The second algorithm we consider is the aforementioned majorization-minimization algorithm recently proposed in [

Initialize M

Repeat

Until Convergence

Recently it has been proposed to use the least-squares (Fréchet) geometric mean (

The _{1} and _{2} commute in multiplication. The interest of this distance is that the geometric (Fréchet) mean (

This is a direct generalization of the geometric mean of positive scalars and, again, from property (48), it equals the FI mean if all matrices in the set pair-wise commute. The computation of (

In [_{1},_{2}∈ℜ^{N·N} behaves similarly to the FI distance if the matrices are close to each other. It is symmetric as a distance, however, it does not verify the triangle inequality. This divergence is
_{n} are, again, the N eigenvalues of either matrix (

In order to estimate

Let

Our idea is to reach an approximation of _{k}^{T} are nearly diagonal, then they nearly commute in multiplication, hence we can employ property (54) to approximate the geometric mean _{1},…,_{K}} (definition 1) chosen so as to minimize criterion (_{k}^{T} are exactly diagonal then the family of means defined by (^{T} just as for the case K = 2 in (^{−1} = ^{1/2} and ^{−1} = ^{−1/2} we obtain the global minimum of the FI mean (

The family of means given by (

Taking into account the ^{T} = ^{-1} and for any diagonal matrix ^{T}, the family of solutions (

If f (^{-1}^{-1}) ^{-1} and any invertible diagonal

Then, note that the family of means defined by (

The family of means defined by (^{-1} and any invertible diagonal

We need to prove that for any invertible matrix ^{-1} and any invertible scaling matrix

Using (

Developing the products of determinants and since

If

We need to prove that

The result follows immediately from the invariance under rescaling of criterion (

So far we have considered the properties of the whole family of means with general form, (_{1},…,_{K}}. This involves choosing a specific scaling ^{T}, where

The uniqueness of this solution, regardless the choice of

Let ^{-1} be an invertible AJD of the set {_{1},…,_{K}}. Scaling

Once matrix

Without loss of generality, hereafter we will choose

In order to satisfy (

Let _{1},…,_{K}} minimizing criterion (

Set

Repeat

Until

Note that instead of the average FI distance

In fact, (

The ALE mean (

We need to show that for any invertible matrix

Let _{1} a well defined AJD of set {_{1},…,_{K}} with inverse _{1} and _{2} a well-defined AJD of set {_{1}^{T},…, _{K}^{T}} with inverse _{2}, both satisfying condition (

Because of Proposition 2, if matrix _{1} approximately diagonalizes the set {_{1},…,_{K}} so does matrix _{2}_{1},…,_{K}} they are equal out of a permutation indeterminacy that we can ignore because of proposition 4. As a consequence _{1} = (_{2}^{-1} = ^{-1}_{2} and thus _{2} = _{1}. Making the substitutions we obtain

The ALE mean verifies the self-duality property (51) if the AJD solution

Self-duality of the ALE mean is verified if

Using definition 2 we have

Using ln(^{-1}) = -ln(^{-1} = exp(

Note that the AJD matrix satisfying criterion (^{T}_{k}

As mentioned in the introduction, the estimation of the FI geometric mean of a set of SPD matrices has proven useful in a number of diverse engineering applications [

where _{True}∈ℜ^{N·N} (true mixing matrix) has entries randomly drawn from a standard Gaussian distribution, _{k} is a diagonal matrix with entries randomly drawn from a squared standard Gaussian distribution and bounded below by 10^{–4} and _{k} is a noise matrix obtained as (1/N)^{T} where ^{N·N} is a matrix with entries randomly drawn from a Gaussian distribution with mean zero and standard deviation

The simulations are done generating data according to (

The arrows link the FI estimation with the corresponding LE and Bha estimations.

The simulations were repeated 100 times, with N = 10 and K = 100. Notice the different scales on the y-axes.

In this paper we explored the relationship between the approximate joint diagonalization of a SPD matrix set and its geometric mean. After appropriate scaling, the inverse of the joint diagonalizer of two SPD matrices is a square root of their geometric mean. For the general case of a SPD matrix set comprised of more than two matrices, we have studied a family of geometric means that includes the geometric mean according to the Fisher information metric (the FI mean) and the log-Euclidean mean. Then we have introduced a specific instance of this family, which is computationally attractive and does not require a search for the optimal step size. We have showed that it approximates the FI geometric mean much better than the log-Euclidean mean. Indeed, this mean, named the ALE mean, can be conceived as an improved version of the log-Euclidean mean, in that i) it satisfies the congruence invariance and ii) similar to the log-Euclidean mean it has the same determinant as the FI mean, but has much smaller trace, thus its trace is much closer to the trace of the FI mean. The ALE mean can be computed by running an AJD algorithm followed by a specific scaling obtained by a simple iterative procedure. The AJD algorithm developed by Pham [