One of the authors (KN) is a paid employee with Kasikorn Securities. This does not alter the authors' adherence to the PLOS ONE policies on sharing data and materials.
Conceived and designed the experiments: KN AA. Performed the experiments: KN. Analyzed the data: KN AA. Contributed reagents/materials/analysis tools: KN AA. Wrote the paper: KN AA.
Correlation coefficients among multiple variables are commonly described in the form of matrices. Applications of such correlation matrices can be found in many fields, such as finance, engineering, statistics, and medicine. This article proposes an efficient way to sequentially obtain the theoretical bounds of correlation coefficients together with an algorithm to generate n
Many important properties of financial models, engineering problems, and biological systems can be represented as correlation matrices, which describe the linear relationships among variables. It is not always the case that these correlation matrices are known; therefore, correlation matrices are an integral part of simulation techniques for solving or analyzing problems in, for example, signal processing
To create a correlation matrix, it is important to ensure that it is valid, meaning that the matrix must be symmetric and positive semidefinite, with the unit diagonal and other elements in the closed interval [−1, 1]. On the contrary, an invalid correlation matrix is one in which assets or variables cannot be correlated according to the specified relationship. The simplest method for constructing a correlation matrix is to use the rejection sampling method, which generates correlation coefficients using uniform random variables in the closed interval [−1, 1]. Subsequently, we check whether the matrix is semidefinite and, if not, another correlation matrix is generated. This procedure is repeated until a valid matrix is obtained. Further details of rejection sampling will be described later in this article. For a lowdimensional matrix, it is relatively easy to use rejection sampling, but when the dimension is greater than or equal to four, the chance of finding a valid correlation matrix becomes very low. However, the number of variables in physical or economic systems is normally considerably greater than four, and so the rejection sampling method is considered inefficient for the largescale construction of correlation matrices.
Instead, for largedimensional problems, there are several techniques for generating a correlation matrix. These can be classified, based on the relevant objectives or constraints, as follows:
Generating of a correlation matrix with predetermined eigenvalues and spectrum
Generating of a correlation matrix with a given mean value
Generating of a correlation matrix based on a random Gram matrix
Generating of a correlation matrix in which each correlation coefficient is distributed within its boundaries
This article focuses on the fourth method presenting an efficient algorithm to calculate the theoretical boundaries of correlation coefficients without the use of optimization techniques. Instead, the theoretical boundaries of each correlation coefficient are calculated from the mathematical structure of the correlation matrix constructed by hypersphere decomposition
It is important to have a common understanding of the definition of a valid correlation matrix. Such a matrix conforms to the following properties:
All diagonal entries must be equal to one;
Nondiagonal elements consist entirely of real numbers in the closed interval [−1, 1];
The matrix is symmetric; and
The matrix is positive semidefinite.
The first three requirements are relatively easy to satisfy. However, the final property of being positive semidefinite requires all eigenvalues to be greater than or equal to zero.
Interestingly, a valid correlation matrix (
It is evident from (4) that matrix
Thus, a valid correlation matrix can be calculated if the correlative angle matrix (
Let us assume that the fourdimensional correlative angle matrix is:
The matrix
Finally, the correlation matrix is:
As shown in (6) to (10), a valid correlation matrix can be constructed from the matrix
Correlation coefficients in the first column (
Other correlation coefficients (
Because all

Lower bound  Upper Bound  Required 

−1  1  No 

−1  1  No 

−1  1  No 











The same logic can easily be applied to higherdimensional correlation matrices, albeit that longer formulas and computational procedures are obtained.
This section describes an algorithm to obtain a correlation matrix by sequentially computing the boundaries of each correlation coefficient, as described in earlier section, and generating uniform random variables (other bounded distributions can always be substituted) within these boundaries. Nevertheless, it is important to note that no optimization is needed to calculate the boundaries of each correlation coefficient. This nonoptimization approach is the major difference between our work and that from presented in
For
For
For
For
Calculate the lower bound (
The method for calculating these boundaries is explained in the earlier section. Please see
If
During our large numerical experiment, numerical instability occurs when the boundary gap (
Extract
End
End
Create a symmetric correlation matrix with unit diagonal elements based on all generated correlation coefficients.
Check the minimum eigenvalue. If it is negative, the correlation matrix is invalid. Otherwise, the correlation matrix is valid.
Reject the invalid correlation matrix, and regenerate the correlation matrix by returning to step 1
In addition, from (1) to (4), we can generate a valid correlation matrix directly from random sample of correlative angles. Unfortunately, based on our experiment, this direct method is not numerically stable. As a result, one may not be able to use the matrix generated from this method in some applications. Thus, we believe that our new algorithm is superior in terms of numerical stability.
For a fivedimensional correlation matrix, let us assume that the uniform random matrix
The lowerbound matrix
As the minimum eigenvalue of
All numerical tests in this study were conducted with MATLAB 7.8.0 (R2009a) on an Intel(R) Core
Rejection sampling method (RS)The rejection sampling method uses uniform random variables in the closed interval [−1, 1] to represent each correlation coefficient in the symmetric correlation matrix. The correlation matrix will be rejected if it is invalid.
Randcorr function of MATLAB (RC)This algorithm is implemented as a MATLAB function, and is based on the work in
The MATLAB code for the NA algorithm (denoted as RandomCorr) is available at
The computational performance of each algorithm is primarily measured by the expected run time (

T 

n  NA  RS  RC  NA  RS  RC  NA  RS  RC 
2  100  100  100  0.0492  0.0149  0.3819  0.0492  0.0149  0.3819 
3  100  61.678  100  0.0710  0.0185  0.4720  0.0710  0.0300  0.4720 
4  100  18.2341  100  0.0900  0.0204  0.5688  0.0900  0.1121  0.5688 
5  100  2.1723  100  0.1164  0.0229  0.6521  0.1164  1.0532  0.6521 
6  100  0.1009  100  0.1501  0.0254  0.7472  0.1501  25.19  0.7472 
7  100  0.001  100  0.1827  0.0385  0.8567  0.1827  3,849.7  0.8567 
8  100  0  100  0.2306  0.0321  0.9669  0.2306  Inf.  0.9669 
9  100  0  100  0.2804  0.0355  1.1653  0.2804  Inf.  1.1653 
10  100  0  100  0.3304  0.0404  1.2686  0.3304  Inf.  1.2686 
11  100  0  100  0.4039  0.0449  1.2318  0.4039  Inf.  1.2318 
12  100  0  100  0.4586  0.0485  1.3230  0.4586  Inf.  1.3230 
13  100  0  100  0.5513  0.0546  1.4448  0.5513  Inf.  1.4448 
14  100  0  100  0.6138  0.0589  1.5067  0.6138  Inf.  1.5067 
15  100  0  100  0.6987  0.0647  1.6531  0.6987  Inf.  1.6531 
16  100  0  100  0.7788  0.0785  1.7076  0.7788  Inf.  1.7076 
17  100  0  100  0.8957  0.0811  1.8294  0.8957  Inf.  1.8294 
18  100  0  100  1.0106  0.0873  1.9429  1.0106  Inf.  1.9429 
19  100  0  100  1.0990  0.0907  2.0996  1.0990  Inf.  2.0996 
20  100  0  100  1.2094  0.0974  2.2008  1.2094  Inf.  2.2008 
21  100  0  100  1.3406  0.1051  2.2840  1.3406  Inf.  2.2840 
22  100  0  100  1.4722  0.1132  2.3952  1.4722  Inf.  2.3952 
23  100  0  100  1.6269  0.1217  2.5304  1.6269  Inf.  2.5304 
24  100  0  100  1.7746  0.1296  2.6631  1.7746  Inf.  2.6631 
25  100  0  100  1.9446  0.1393  2.7386  1.9446  Inf.  2.7386 
26  100  0  100  2.1356  0.1492  2.8582  2.1356  Inf.  2.8582 
27  100  0  100  2.2533  0.1585  2.9899  2.2533  Inf.  2.9899 
28  100  0  100  2.4576  0.1689  3.0942  2.4576  Inf.  3.0942 
29  100  0  100  2.6411  0.1806  3.2981  2.6411  Inf.  3.2981 
30  100  0  100  2.8306  0.1904  3.4048  2.8306  Inf.  3.4048 
35  100  0  100  3.9381  0.3315  4.0185  3.9381  Inf.  4.0185 
40  100  0  100  5.3749  0.3971  4.7135  5.3749  Inf.  4.7135 
45  100  0  100  6.8185  0.5067  5.7925  6.8185  Inf.  5.7925 
50  100  0  100  8.5822  0.6172  8.5822  8.5822  Inf.  6.9464 
With a
To compare the PDF of the coefficients of correlation matrices,



Statistical measure  NA  RS  RC  NA  RS  RC 
Mean  −0.001  −0.0001  −0.0009  0.0004  −0.0004  0.001 
Median  −0.0024  −0.0001  −0.0015  −0.0013  −0.0006  0.0011 
Standard Deviation  0.5289  0.4079  0.2779  0.5281  0.4086  0.2901 
10 
−0.7288  −0.5515  −0.3536  −0.7268  −0.5528  −0.3667 
90 
0.7301  0.5503  0.3517  0.7297  0.5516  0.3697 
Skewness  0.0062  0.0012  −0.0015  0.0027  −0.0018  0.009 
Kurtosis  1.9421  2.2551  3.0207  1.9444  2.2496  2.6727 
In this paper, we have presented an efficient method to calculate the boundaries of correlation coefficients. We also demonstrated a technique for generating correlation matrices using any bounded random variable distribution within the boundaries of each correlation coefficient. However, this method causes the correlation coefficients to be unevenly distributed. Thus, we incorporated a technique for random reordering to ensure the even distribution of all correlation coefficients. The performance of the proposed algorithm was compared to that of other algorithms. It was shown that the new algorithm could efficiently construct correlation matrices, particularly when the dimension of the matrix was in the range 4–35. In theory, our algorithm should always return valid correlation matrices. However, without setting a threshold factor and using rejection sampling logic, the algorithm exhibited some numerical instability when the dimension became large. It is possible to adjust invalid matrices to form valid ones; this method has been developed in many studies