The authors have declared that no competing interests exist.

Conceived and designed the experiments: BW. Performed the experiments: BW. Analyzed the data: BW WS ZM. Contributed reagents/materials/analysis tools: BW. Wrote the paper: BW.

Standard deviational ellipse (SDE) has long served as a versatile GIS tool for delineating the geographic distribution of concerned features. This paper firstly summarizes two existing models of calculating SDE, and then proposes a novel approach to constructing the same SDE based on spectral decomposition of the sample covariance, by which the SDE concept is naturally generalized into higher dimensional Euclidean space, named standard deviational hyper-ellipsoid (SDHE). Then, rigorous recursion formulas are derived for calculating the confidence levels of scaled SDHE with arbitrary magnification ratios in any dimensional space. Besides, an inexact-newton method based iterative algorithm is also proposed for solving the corresponding magnification ratio of a scaled SDHE when the confidence probability and space dimensionality are pre-specified. These results provide an efficient manner to supersede the traditional table lookup of tabulated chi-square distribution. Finally, synthetic data is employed to generate the 1-3 multiple SDEs and SDHEs. And exploratory analysis by means of SDEs and SDHEs are also conducted for measuring the spread concentrations of Hong Kong’s H1N1 in 2009.

Standard deviation arises as one of the classical statistical measures for depicting the dispersion of univariate features around its center. Its evolution in two dimensional space arrives at the standard deviational ellipse (SDE), which was firstly proposed by Lefever [

Wide utilization potentialities exerted by SDE are extensively found in many research fields and commercial industries. For instance, Smith and Cheeseman [

As a GIS tool for delineating spatial point data, SDE is mainly determined by three measures: average location, dispersion (or concentration) and orientation. In addition to the traditional mean center (gravity of the distribution) suggested by Lefever [

Although SDE has extensive applications in various fields ever since 1926, it still has not been correctly clarified sometimes. For instance, from the latest resources in

For fully clarifying the implications of SDE, Sect. 2 below devotes to firstly summarizing two existing models of deriving the SDE’s calculation formulas, and secondly proposing a novel approach for constructing the same SDE based on spectral decomposition of the sample covariance, by which SDE concept is further extended into higher dimensional Euclidean space, named standard deviational hyper-ellipsoid (SDHE). Most of all, rigorous recursion formulas are then derived for calculating the confidence levels of scaled SDHE with arbitrary magnification ratios in any dimensional space. Besides, an inexact-newton method based iterative algorithm is also proposed for solving the corresponding magnification ratio of a scaled SDHE when the confidence probability and space dimensionality are pre-specified. Finally, synthetic data is employed to generate the 1–3 multiple SDEs and SDHEs in two and three dimensional spaces, respectively. Meanwhile, exploratory analysis by means of SDEs and SDHEs are also conducted for measuring the spread concentrations of Hong Kong’s H1N1 in 2009.

First two subsections below devotes to a brief summarization of two classical approaches to generating the standard deviational ellipses in 2D. After that, a novel approach based on spectral decomposition of the covariance matrix is introduced which achieves the same calculation formula of SDE. This spectral decomposition based approach will be adopted for constructing the generalized standard deviational (hyper-)ellipsoids into higher dimensional Euclidean space in the next Sect. 3.

Standard deviational ellipse delineates the geographical distribution trend by summarizing both dispersion and orientation of the observed samples. There are already several approaches to obtaining the computational formula of SDE. The upcoming discussed method presented by Yuill [

Suppose a series of independent identically distributed samples (_{i}, _{i}),

The maximum likelihood estimator [

It should be noticed that rotating ^{T}and covariance matrix

Another method described by Cromley [^{2}^{2}

Using the symbols introduced in

It must be said there are two common textbook definitions of variance and covariance, as well as the standard deviation. One is the unbiased estimator while the other one is the maximum likelihood estimator proved by Li and Racine [

After spectral decomposition of the sample covariance (7), SDE can be constructed by assigning square roots of eigenvalues as the lengths of its semi-major and semi-minor axes [

In conclusion, the above three approaches actually all calculate the same SDE according to formulas (

In Sect. 2, three approaches for constructing SDE have been summarized and compared upon the distributed samples in two-dimensional space. This section will generalize the SDE concept into higher dimensional Euclidean space, yielding the standard deviational hyper-ellipsoid (SDHE), be means of the spectral decomposition of covariance matrix. Meanwhile, rigorous mathematical derivations attempt to figure out the relationship between the confidence levels characterizing the probabilities of random scattered points falling inside a scaled SDHE and the corresponding magnification ratio under the assumption that samples follow Gaussian distribution.

Suppose ^{n} be an n-dimensional Gaussian random vector, that is _{1},_{2},…, _{m} represent _{i}_{1}_{2}≥…≥_{n}. Due to the symmetry of covariance matrix

Proceeding in this way, now comes to such an interesting question: how could this SDHE defined by _{n}) In other words, Mahalanobis transformation eliminates correlation between the variables and standardizes each variable with variance. Apparently, random vector

This section settles the relationship between confidence levels characterizing the probabilities of random scattered points falling inside the scaled ellipsoids and the corresponding magnification ratio of such an SDHE by means of the rigorous mathematical formulas derivations.

The following scalar quantity

As mentioned above, SDE serves as a versatile spatial statistical tool for measuring the geographical distribution of features. Because of this, it has been embedded into many commercial software, like ArcGIS and Stata [

_{n}_{n}), whose confidence region is exactly a sphere as explained in subsection 3.1; namely,

For 2D case,

It’s worth noting that an inverse formula here exists,

Before proceeding to the general formulas applicable in

Computation of confidence levels using

Dimensionality | Magnification factor | ||||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | |

1 | 0.6827 | 0.9545 | 0.9973 | 0.9999 | 1.0000 | 1.0000 | 1.0000 |

2 | 0.3935 | 0.8647 | 0.9889 | 0.9997 | 1.0000 | 1.0000 | 1.0000 |

3 | 0.1987 | 0.7385 | 0.9707 | 0.9989 | 1.0000 | 1.0000 | 1.0000 |

4 | 0.0902 | 0.5940 | 0.9389 | 0.9970 | 0.9999 | 1.0000 | 1.0000 |

5 | 0.0374 | 0.4506 | 0.8909 | 0.9932 | 0.9999 | 1.0000 | 1.0000 |

6 | 0.0144 | 0.3233 | 0.8264 | 0.9862 | 0.9997 | 1.0000 | 1.0000 |

7 | 0.0052 | 0.2202 | 0.7473 | 0.9749 | 0.9992 | 1.0000 | 1.0000 |

8 | 0.0018 | 0.1429 | 0.6577 | 0.9576 | 0.9984 | 1.0000 | 1.0000 |

9 | 0.0006 | 0.0886 | 0.5627 | 0.9331 | 0.9970 | 1.0000 | 1.0000 |

10 | 0.0002 | 0.0527 | 0.4679 | 0.9004 | 0.9947 | 0.9999 | 1.0000 |

Observed from

_{0},_{0},_{a},_{r}

Evaluate _{0}_{n}(r_{0})-_{a}_{r}

Calculate the Newton direction ^{-1}

^{2}

End While

End While

Input arguments for this algorithm are the initial iterate _{0} with default value

Dimensionality | Confidence Level (%) | |||||
---|---|---|---|---|---|---|

80.0 | 85.0 | 90.0 | 95.0 | 99.0 | 99.9 | |

1 | 1.2816 | 1.4395 | 1.6449 | 1.9600 | 2.5758 | 3.2905 |

2 | 1.7941 | 1.9479 | 2.1460 | 2.4477 | 3.0349 | 3.7169 |

3 | 2.1544 | 2.3059 | 2.5003 | 2.7955 | 3.3682 | 4.0331 |

4 | 2.4472 | 2.5971 | 2.7892 | 3.0802 | 3.6437 | 4.2973 |

5 | 2.6999 | 2.8487 | 3.0391 | 3.3272 | 3.8841 | 4.5293 |

6 | 2.9254 | 3.0735 | 3.2626 | 3.5485 | 4.1002 | 4.7390 |

7 | 3.1310 | 3.2784 | 3.4666 | 3.7506 | 4.2983 | 4.9317 |

8 | 3.3212 | 3.4680 | 3.6553 | 3.9379 | 4.4822 | 5.1112 |

9 | 3.4989 | 3.6453 | 3.8319 | 4.1133 | 4.6547 | 5.2799 |

10 | 3.6663 | 3.8123 | 3.9984 | 4.2787 | 4.8176 | 5.4395 |

Seen from

In this section, two groups of synthetic data are employed to generate the 1–3 multiple SDEs and SDHEs in two and three dimensional spaces, respectively, to depict their aggregation extent and demonstrate the relationship between the scaled ellipse (or ellipsoid) size and their corresponding confidence levels.

_{i} ϵ ^{2} are randomly generated from a two dimensional Gaussian vector, that is _{i}~^{T}, and covariance

For a better visualization of SDEs in computer imaging, the observed samples can be overlaid by a warning coloration, for example a (gradually varied) red layer processed with a transparency function. Intuitionally it should be inversely proportional to the confidence probability density of the features. By incorporating

_{i}ϵ^{3} are randomly generated, following 3D Gaussian distribution, that is _{i} ~ ^{T}, and covariance

The spread of epidemic diseases causes both very serious life risks and social-economic risks. For example, the latest epidemic outbreak in Hong Kong was Swine Flu Virus A (H1N1) causing hundreds of deaths and making all the residents get into a panic of fatal infection.

Geographic information science (GIS) serves as a common platform for convergence of disease surveillance activities. As one of its significant functional components, SDE, as well as SDHE, can be served to understand how the disease distributes together with its evolutionary trend, thereby assisting the epidemiologists or public health officials raising more effective strategies so as to control the disease spread.

For the epidemic data, totally 410 human swine influenza infected cases are gathered with epidemiological date and address from 1st May to 26th June on a daily basis released by Center of Health Protection (CHP), Hong Kong. Addresses of infected buildings are then geocoded into the WGS84 coordinate for the subsequent mapping. Exploratory analysis by 1–3 multiple SDEs is then conducted in order to keep the focus limited to only those areas with the most occurrences of infected cases (

Further, 1–3 multiple SDHEs (in three-dimensional space) are also employed for highlighting the spatiotemporal concentrations of H1N1 infections (

In this paper, confidence analysis of standard deviational ellipse (SDE) and its extension into higher dimensional Euclidean space has been comprehensively explored from origin, formula derivations to algorithm implementation and applications. Firstly, two existing models are summarized and one novel approach is proposed based on the spectral decomposition of sample covariance for calculating the same SDE. After that, the SDE concept is naturally generalized into higher dimensional Euclidean space, named standard deviational hyper-ellipsoid (SDHE). Then, rigorous recursion formulas are derived for calculating the confidence levels of scaled SDHE with arbitrary magnification ratios in any dimensional space. Such formula can be employed for tabulating the confidence levels in relation to the magnification ratio and the space dimensionality more efficiently since the results obtained in low dimensional space can still be repeatedly utilized in subsequent higher dimensional spaces, whereas the traditional approach of calculating the chi-square distribution is mainly relying on the complex computation of gamma density function. Besides, an inexact-newton method based iterative algorithm is also proposed for solving the corresponding magnification ratio of a scaled SDHE when the confidence probability and space dimensionality are pre-specified, thereby making a commutatively computation of either the necessary scaled ratio or the confidence level of SDHE when one of these two parameters is given in any dimensional space. These results provide a more efficient manner to supersede the traditional table lookup of tabulated chi-square distribution.

Finally, synthetic data is employed to generate the 1–3 multiple SDEs and SDHEs. And exploratory analysis by means of SDEs and SDHEs are also conducted for measuring the spread concentrations of Hong Kong’s H1N1 in 2009.

It is worth noting, standard deviational ellipses (or the SDHE) derive under the assumption that observed samples follow the normal distribution. Therefore, SDE tool must be employed with a certain degree of caution when measuring the geographic distribution of concerned features. Particularly, delineation of an area concerned by SDE may not be representative of the hotspot boundaries, but produce ambiguous outcomes when distribution of features is multimodal [

Fortunately, the aforementioned normal distribution assumption is no longer indispensable for the confidence ellipses owning to considerable progresses in the last three decades. Nonetheless, these shining ideas emerged during the SDE derivation process still sparkle for prompting innovative advanced models, among which the elliptically contoured distribution [^{T}^{-1}

(XLSX)