Random selection of factors preserves the correlation structure in a linear factor model to a high degree

In a very high-dimensional vector space, two randomly-chosen vectors are almost orthogonal with high probability. Starting from this observation, we develop a statistical factor model, the random factor model, in which factors are chosen stochastically based on the random projection method. Randomness of factors has the consequence that correlation and covariance matrices are well preserved in a linear factor representation. It also enables derivation of probabilistic bounds for the accuracy of the random factor representation of time-series, their cross-correlations and covariances. As an application, we analyze reproduction of time-series and their cross-correlation coefficients in the well-diversified Russell 3,000 equity index.


Introduction 1.Vectors in a high-dimensional space
Any two randomly-selected vectors in a high-dimensional vector space are likely almost orthogonal with respect to each other (Hecht-Nielsen, 1994;Kaski, 1998).This observation has relevance to time-series analysis, since a long time-series corresponds to a vector in a high-dimensional vector space and orthogonality of vectors corresponds to uncorrelatedness of time-series.Hence, timeseries corresponding to the two randomly-selected vectors are almost uncorrelated.
When the length of the time-series increases, or equivalently, dimension of the vector space increases, the probability that two randomly-selected time-series are uncorrelated increases (Hecht-Nielsen, 1994).If we select a set of, say, n vectors randomly, these vectors are approximately orthogonal to each other if the dimension of the space is sufficiently high.Then, high-dimensionality of the data may in some cases even be an asset: in a high-dimensional space, almost any set of random vectors yields an almost uncorrelated set of factor time-series that can be used as a basis for a linear factor model.

Factor models
Factor models are extensively used in financial applications to model asset returns (see, e.g., Campbell et al. (1997)) and to decompose them to loadings of risk factors.The two main types of factor models are fundamental factor models and statistical factor models.In a fundamental factor model, the aim is to find observable asset characteristics, e.g., financial ratios and macro-economic variables, capable of explaining the behavior of the market stock prices, that are often extrinsic to the asset time-series.
The explanatory fundamental and economic variables can be highly correlated with each other, which may cause, e.g., multi-collinearity in a fundamental factor model.Returns predicted by a fundamental factor model may then be more correlated than the observed returns, which is the main reason for the inclusion of specific risk components in a factor model.
Statistical factor models are a commonly-used alternative for fundamental factor models.In a statistical factor model, factors are extracted from asset returns.The principal component analysis (PCA; see, e.g., Alexander (2001)) is an example of a statistical technique for finding factors from asset time-series.
PCA works well when the analyzed time-series are highly correlated, which may indicate the presence of a common driver.Applications of PCA include models of interest rate term structure, credit spreads, futures, and volatility forwards.In PCA, several principal components often have an intuitive financial interpretation.In ordered highly-correlated systems, the first principal component captures an almost parallel shift in all variables and is generally labeled the common trend component.The second principal component captures an almost linear tilt in the variables, while the higher order principal components capture changes that are quadratic, cubic and so forth (see, e.g., Alexander (2009)).In the equity markets, the higher order principal components may often, but certainly not always, be interpreted as market movements caused by different investment style tilts.Stock and Watson (2002) show that principal components found using PCA provide consistent estimators of the true latent factors in the limit of both time and cross-sectional size go to infinity.They extend consistent estimation of the classical factor model with non-correlated errors to approximate factor models with cross-correlated and sectionally-correlated error terms.
Classical factor models include only a handful of factors.The best-known factor model in the literature is likely the Capital Asset Pricing Model (CAPM), which assumes that a single risk factor, the market, drives returns in a portfolio of assets (Sharpe, 1964).A number of factor models have extended this view (see, e.g., Ross (1976); Fama and French (1993)).Recent increase in computational power has enabled development of models with a large set of factors.Today, factor models are popular in market risk modeling, e.g., the Barra models (Grinold and Kahn, 2000) depend on hand-picked market factors to explain behavior of the analyzed financial instruments.Boivin and Ng (2006) demonstrate that there are situations in which the use of a larger number of time-series may actually result a worse factor estimate than a smaller number of time-series.A significant amount of recent literature has been devoted to address the issue of consistent estimation under conditions where the number of time-series is large compared to the length of time-series (e.g., Bun et al. (2017); Ledoit and Wolf (2012)).

Choice of factors
The choice of factors clearly influences the ability of a factor model to explain investment risk of a portfolio, in particular when the factor model consists of only a few carefully-chosen factors.When the number of factors is large compared to the number of time-series analyzed, it may not much matter which factors are chosen as long as the factors span a sufficiently large sample space.Even then, relative importance of factors is often of interest in risk management.However, it is not clear how well an arbitrary set of factors would enable analysis -or at least description -of the risk.This is the issue that we analyze in this study: take a random set of factors and see whether it enables reproduction of the data and its interdependencies.
A good starting point for developing a random factor model is the random projection method (see, e.g, Bingham and Mannila (2001); Vempala (2005)) that consists of a projection of data to a lower-dimensional space by a random matrix.The random projection method has been used, e.g., to reduce the complexity of the data for classification purposes (Kohonen et al. (2000)), for structure-preserving perturbation of confidential data in scientific applications (Liu et al., 2006), for data compression (Bingham and Mannila, 2001), for compression of images (Amador, 2007), and in the design of approximation algorithms (Blum, 2006).The random projection allows one to reduce dimensionality of the investigated problem, often substantially, while preserving the structure of the problem.

This study
It is largely an open question whether and how well randomly-chosen factors can be used to describe a large data set.This is the issue we approach in this study.For this purpose we develop a factor model based on randomly-chosen factor time-series, the random factor model.We show that randomness of factors has certain desirable properties, such as well-defined probabilistic limits on the accuracy of the factor representation.In addition, randomly-chosen factors are almost orthogonal with high probability, and with a proper normalization, they are expected to be orthonormal.We also show how the random factor model converges toward the modeled data when the number of factors increases, and that a random factor model preserves pair-wise correlations well with high probability.
The article is structured as follows.In Section 2, we develop the random factor model based on the random projection method and derive theoretical results describing the model (more details can be found from Appendix A).For example, we show that randomness of factors enables derivation of theoretical results on the accuracy of the model.
As an application of the random factor model, Section 3 provides an analysis of the correlation matrix of Russell 3,000 equity index using the factor models described in Section 2. We analyze the ability of random factor models to reproduce equity log-return time-series and their correlations and covariances.The reproduction of data in a random factor model is compared with a reproduction obtained using principal component analysis both at the individual time-series level and at the dependence structure level.
In Section 4, we compare different random factor models and show that the results, or rather their accuracy, are quite universal.In Section 5, we discuss the results and their possible implications.
In Appendix A, we prove a version of the Johnson-Lindenstrauss-like theorem appropriate for the random factor model.It gives probabilistic bounds on the accuracy of the correlation and covariance preservation in a random factor model.

Notations
Let d observations of time-series Z : N → R be viewed as a vector in d-dimensional space R d , where each observation of the time-series corresponds to one coordinate of the vector.The set of N such time-series can be packed into matrix X ∈ R d×N , in which observations are in columns.We assume that the time-series data has been preprocessed, so that each time-series is averaged to zero, that is, m X mb = 0 for each b = 1, ..., N .We employ sample statistics in this study.Definitions for mean µ, variance σ 2 and covariance C are where x, y ∈ R d .The central parameters used in this study are summarized in Table 1.

Linear factor models
A linear factor model describes the target data set as a loading-weighted sum of factors (e.g., McNeil et al. (2015)).Let , where X b is b:th column of matrix X, b = 1, ..., N , can be represented as a sum of products of factor loadings L bj ∈ R and factors F j , that is, where ǫ b is an idiosyncratic risk component.Since we collect the observations into columns of X and F , the formula (2) is written in a matrix form as X = F L T + ǫ.
Factors in F may or may not be directly observable in the market data.For observed factor time-series, it suffices to project the data to factors to get loadings.For unobserved factor timeseries, some method such as PCA or some other optimization-like method is required to find the factors and their loadings.

Random projection
Random factors are here chosen using the random projection method.The key idea of random projection is based on the Johnson-Lindenstrauss lemma (Johnson and Lindenstrauss, 1984;Bingham and Mannila, 2001): if points in a high-dimensional space are projected onto a randomly selected subspace of suitably high dimension, then the distances between the points are approximately preserved.A suitably high-dimensional subspace has dimension proportional to log(N )/ε, where N is the number of time-series and ε the desired accuracy (Dasgupta and Gupta, 2003).
Random projection A large variety of probability distributions can be used to construct projection matrix B (more on this in Section 3.3).The most obvious choice is to assume that matrix B is taken from the matrix-variate normal distribution with independent entries, that is, from N k×d (0, 1 k × 1 d ) (Gupta and Nagar, 1999).Then each element is N (0, 1)-distributed and independent of other elements.

Definition and properties
We define the random factor model (RFM) for data set X ∈ R d×N via a projection2 P : R d×N → R d×N , P X = aB T BX, where where a ′ > 0 is a constant related to factor normalization, as discussed in Section 2.4.3.Then F ∈ R d×k behaves as a matrix of k d-dimensional factor time-series.It is worth stressing that matrix F consists of random time-series that in no way depend on the data.L ∈ R N ×k is a matrix of k factor loadings for the N time-series.Projection P can be factored as P X = F L T . (5) Defining ǫ * = X − P X yields an approximate factorization for data matrix X.We will analyze equation ( 6), and in particular error term ǫ * , further in the following.Equation (6) shows that data matrix X can be approximately decomposed to a product of two components.
As an aside, let us mention that we could equally well have considered a random projection in the equity direction instead of the above time-series direction.This can be accomplished using a matrix Q = aR T R, where a > 0 and R ∈ R k×N is a random matrix, and then considering XQ as the projected matrix.This naturally leads to a factor model interpretation with a loading matrix a a ′ R T and a factor matrix a ′ XR T .In analogy to the earlier terminology, this model could be called random loading model.The properties proven later for the random projection P then immediately carry over to the projection Q, one merely needs to replace "d" with "N " in all of the results.
However, from the point of view of the time series, the two projection methods P X and XQ could behave differently.For instance, if there are more pronounced correlations between different equities at a fixed time than between the same equity at two different times, then one would expect to need larger values of k in the projection XQ than in the projection P X to reach the same level of accuracy in the approximation.It is also possible to apply both random projections simultaneously and study P XQ instead of P X or XQ.This double-sided projection would still have properties very similar to the one-sided projections, as long as the random matrices B and R are chosen independently of each other.Since the three alternatives are on a technical level very similar, we focus only on the choice P X in the following.
As the next step, we need to find a suitable constant a so that standard deviation, covariance and the expected value of the data are preserved, if possible.Under these conditions, ǫ * should be close to zero.Different choices of a yield slightly different properties for the RFM, but it turns out that we cannot satisfy all these requirements at the same time if we base matrix B on the normal distribution 3 .
Here we concentrate on preserving the covariance matrix C x,y in the projection.Then normalization constant a > 0 must be such that expectation with respect to N k×d (0, for any zero-mean vectors x, y ∈ R d×1 .It is worth stressing that the expectation in equation ( 7) is taken over random factor models, not over time-series x and y.Theorem A.1 in the Appendix shows that this is possible but only if we choose a = 1/ k(k + d).Let a have this value from this point on.
The expected covariance between time-series x and y is then preserved, regardless of the number of factors used.Since [σ 2 x , our choice of a also preserves time-series variance.This result shows that an RFM is expected to fulfill the consistency requirement of variance, that is, it shows that 3 For example, it follows from the results proven in the Appendix that, when a = 1/k, representation (2) is exact on average, in the sense that [P X] = X.However, this representation over-estimates the sample variance σ 2 x of a time-series x ∈ R d×1 , since then [σ 2 Hence, the asset returns would fluctuate too much in this normalization in the typical regime where k ≪ d.In addition, although the projection would then produce the correct time-series on average, the actual values are dominated by fluctuations: the standard deviation of (P x)m is at least σx d/k and thus one given sample of the random factors is unlikely to be a useful representation of the data, unless k is at least comparable to d.
where σ 2 x,pop is the population variance and σ 2 x,d is the sample variance in dimension d.An application of the Jensen's inequality implies that [σ P x ] ≤ σ x , that is, volatility is not overestimated.
Representation (6) always preserves the average of a zero-mean vector x ∈ R d , that is, [µ P x ] = 0.In contrast, the m:th observation x m of time-series x has an expectation For a small number of factors, mapping to (P x) m will on average underestimate the original value x m since k/(k + d) < 1.In the limit of large number of factors4 , (P x) m approaches x m , since lim k→∞ and the standard deviation of (P x) m is O d/(d + k) and thus goes to zero when k → ∞.Hence, a RFM reproduces any vector x ∈ R n component-by-component in the limit of large number of number of factors, for k ≫ d.Thus ǫ of equation ( 6) approaches zero when the number of factors increases.
The RFM is expected to reproduce mean, variance and covariance of time-series x.Componentwisely, the random factor model is expected to converge to the observed component values in the limit of large number of factors.

Covariance preservation
Equation ( 7) does not state that each RFM always preserves the covariance matrix.Nevertheless, it is reasonable to assume that an RFM approximately preserves the covariance matrix.Next we will analyze how well an RFM will typically preserve the covariance matrix.
But first, it is worth recalling that Johnson-Lindenstrauss theorem (Johnson and Lindenstrauss, 1984;Dasgupta and Gupta, 2003;Matoušek, 2008) gives probabilistic bounds for the accuracy of distance preservation in the random projection.A number of versions of Johnson-Lindestrauss theorem have been proven, however, in all versions known to us, it is assumed that random variables have zero expectation.
Matrix B T B ∈ R d×d is a singular Wishart matrix (also known as an anti-Wishart matrix), which has non-zero expectation, d − k zero eigenvalues and k non-zero eigenvalues.Since matrix B T B has non-zero expectation, it was not a priori clear if a Johnson-Lindenstrauss type theorem holds.Theorem A.1 proven in the Appendix fills this gap for the present type of anti-Wishart matrices, and it also contains a detailed derivation of the above expectation values for an arbitrary value of the scaling parameter a.
We have collected in Corollary A.2 the corresponding results for the choice which preserves the sample covariance matrices in expectation, for a = 1/ k(k + d).The precise control of fluctuations in the covariance estimates requires nontrivial combinatorial computations, given in the Appendix.As proven in Corollary A.2, for every b > 0 and non-random vectors u Inequality ( 10) gives bounds on the accuracy of covariance preservation for an arbitrary random factor model.Here probability is taken with respect to an ensemble of random factor models.Hence, if σ 2 u , σ 2 v ≤ 1, the probability that covariance of vectors u and v is preserved in a random factor model more accurately than bound b is at least 1−8/(kb 2 ), where k is the number of factors.In general, we can set b = εσ u σ v , with ε > 0, and also conclude that the accuracy, relative to the sample variance scale σ u σ v , is at least ε with a probability of at least 1 − 8/(kε 2 ).For the bound to be informative, it is necessary that ε > k −1/2 .The error in the covariance estimate decreases at least inversely with the number of factors in almost any random factor model.Given a sufficient number of factors, covariance of any two time-series can be approximated with an arbitrarily high accuracy using an RFM.This and the fact that random factors are in no way fitted to the data suggest that the typical accuracy of an RFM depends mainly on the number of factors k.
Corollary A.2 also gives a bound on how accurately correlation between projected vectors is preserved.Correlation Corr(u, v) coincides with covariance C u,v when σ u = σ v = 1.Then inequality (10) gives a lower bound on how well C P u,P v approximates correlation between u and v.
These results can be summarized as a statement about the projected matrix P X as follows: valid for any b, c = 1, 2, . . ., N and ε > 0.

Almost orthogonality
Orthogonality is a desirable property of a factor set.An orthogonalization procedure can be used to obtain an orthogonal factor set, but orthogonalization is computationally expensive.Fortunately, orthogonalization is not a necessary step in the RFM.
Given any two random factors (as defined above), their inner product is expected to be orthogonal, that is, This shows that with the choice a ′ = 1/ √ d, the factors F j ′ and F j are expected to be orthonormal as a consequence of the properties of normally distributed random variables.
Using Theorem A.1 we can also compute the variance of the inner product.This yields Higher cumulants approach zero even more rapidly, as can be seen by analyzing the cumulant generating function and its series expansion in λ.When j = j ′ , nth cumulant is of order O(d 1−n ).When j = j ′ , cumulants are of order O(d 1−n ) for even n and zero otherwise.Convergence to the Normal distribution in the limit of large d then follows by the standard arguments (e.g., Billingsley (1995)).Inner product matrix is approximately distributed as When d is large (≫ 1000), standard deviation is only a fraction of the expectation for diagonal elements.Fluctuations around zero are small for non-diagonal elements.The cumulant expansion shows that the factors are almost orthonormal even at a relatively low dimension.
In addition, the factors are on average orthogonal to the error term ǫ * : since ǫ * is an even polynomial of B:s and F j is linear in them, we have d m=1 F mj ǫ * mb = 0 for all j = 1, 2, . . ., k and b = 1, 2, . . ., N .

Principal component analysis
PCA is a well-known technique which uses a linear transformation to form a simplified data set retaining the characteristics of the original data set (see, e.g., Johnson et al. (2014)).In investment risk measurement, PCA is used to explain the covariance structure of a set of market variables through a few linear combinations of these variables.The general objectives of using PCA are to reduce the dimensions of covariance matrices and to find the main risk factors.The risk factors can then be used to analyze, e.g., the investment risk sources of a portfolio, or to predict how the value of the portfolio will develop.
Projection to principal components is most directly obtained using the singular value decomposition (Golub and Van Loan, 2012).Given data matrix X ∈ R d×N , SVD decomposes it as X = P L DP T R , where P L ∈ R d×d is matrix of left singular vectors, P R ∈ R N ×N is matrix of right singular vectors, and D ∈ R d×N is the rectangular diagonal matrix of singular values.PCA-based factor representation of X is given by X = F L T , where L = P R ∈ R N ×N gives the factor loading matrix and F = P L D ∈ R d×N defines the factors of the N equities.When reducing the dimensions of the original dataset, the first k principal components with the largest eigenvalues are chosen to represent the original dataset.This yields an approximation of the data matrix using a subset of factors.A k-factor approximation of matrix X is given by F (k) (L (k) ) T , where L (k) ∈ R N ×k contains the first k factor loadings, F (k) ∈ R d×k contains the components of the first k factors from P L D. It can be shown that in the mean-error sense, PCA gives the best linear k-factor approximation to matrix X (e.g., Reris and Brooks (2005); Eckart and Young (1936)).Principal components correspond to directions along which there is most variation in the data.However, there are no guarantees that pair-wise distances are preserved in PCA.
PCA yields the relative importance of the most important risk sources (defined in factor matrix F ) in an investment portfolio.The relative importance of risk factors is shown by the size of eigenvalues.The eigenvectors with highest eigenvalues correspond to the most important risk factors.Loadings then tell how much investment instruments depend on these factors.
Nevertheless, it should be stated that PCA aims to capture total variation, not correlations (Johnson et al., 2014).

Comparison of factor models
Despite appearance, the RFM and PCA share many features.In both models, the data can be represented as F L T , where L contains k factor loadings and F defines k factor time-series with d observations.In PCA, the most important eigenvectors are found by choosing the largest eigenvalues.No such ordering is available for random vectors.A random vector is essentially as good as the next random vector as a factor.
The RFM has an almost orthonormal factors, while PCA yields strictly orthonormal factors.After finding the factors, both the RFM and PCA project the data to these vectors.The ways in which the RFM and PCA end up with representations of the data matrix are quite different: in PCA, data is projected along principal components (factors) and only the desired set of these projections (loadings) are kept.In the RFM, the data is projected along the random factors.The main difference is in the way that factors are chosen.
PCA requires O(d 2 N ) + O(d 3 ) operations, while the RFM requires O(kdN ) operations, given the factor time-series.Since that the number of factors is typically significantly smaller than dimension of data, the RFM is computationally much more efficient than PCA.
We do not aim at proving the supremacy of the RFM over PCA.We rather use PCA as a yardstick against which the RFM is compared.It is worth remembering that there is no fitting to data in the RFM, so one could reasonably expect that PCA would surpass RFM in every respect in data experiments.3 Application to the Russell 3,000 equity index The Russell 3,000 index (ticker RAY in Bloomberg) measures the performance of the 3,000 largest US companies.The index represents about 98 percent of the investable US equity market.Here, we investigate how well a random factor model reproduces log-returns of the Russell 3,000 equities and their cross-correlations, and compare the results to those obtained using PCA.
For our analyses, we employ daily log-returns of Russell 3,000 equities from 2000-01-03 to 2013-02-20 (in total 3,305 observations).This interval contains several phases of the business cycle and certain special events, e.g., the crash of September 2008.Of the 3,000 constituent time-series in the index, we used a subset of 1,591 time-series with continuous daily data covering the entire period.To apply the analysis methods, the data is normalized by subtracting mean of each return time-series and by dividing by its standard deviation.

Reproduction of time-series
Fig. 1 provides three examples of time-series reproduction using the RFM and PCA.The RFM (grey solid curve in Fig. 1) provides a good reproduction of the single time-series even with a low number of factors.The accuracy of the reproduction improves with the number of factors: the agreement of the RFM and the data is very good with 500 factors.
The number of factors is not limited to the number of time-series in the RFM, since the random factors do not necessarily span the entire space in which time-series may have values.Only in the limit of large number of factors is the entire space covered.
Both PCA and the RFM provide good reproductions of the data (Fig. 1), however, there are deviations from the data in each reproduction.In the root mean square error (RMSE) sense, PCA gives a better reproduction of the time-series than the RFM (Fig. 2A).RMSE in the reproduction of the entire data set is 0.79 in PCA vs 1.37 in the RFM with 10 factors (Fig. 1A).

Volatility
The RFM reproduces volatility of the time-series almost exactly even with a small number of factors, while in PCA volatility estimates improve pronouncedly with more factors (Fig. 2B).Since volatility of each time-series is normalized to 1 separately, accuracy of volatility reproduction is relative to volatilities of the underlying time-series in Fig. 2B.In the RFM, error in volatility6 is about 3.1 percent of volatility with ten factors.In PCA, error is about 41.7 percent of volatility with ten factors.Accuracy increases until 1,000 factors is reached, after which essentially no error is observed in PCA.While the RFM reproduces the overall volatility of the equity time-series faithfully, it does not capture time-dependence of volatility particularly well (data not shown).

Correlation coefficient
Fig. 3 shows the accuracy of reproduction of correlation coefficients in all analyzed pairs of stocks.In the RFM, the median error converges rapidly to zero with only a few factors.The 25th and 75th percentiles of error converge toward zero when the number of factors increases.Together these three curves form a funnel (Fig. 3A) that rapidly converges toward zero.This shows that the typical accuracy of the correlation coefficient reproduction improves rapidly with the number of factors.Still, some noise persists even with the "full" set of factors, for k = d.
In PCA, median error approaches zero only with around 1,000 factors, which is largely a consequence of PCA significantly underestimating volatilities of time-series.The 25th and 75th percentiles concentrate around the median away from zero in PCA.
Fig. 3B shows results on absolute error in correlation coefficient7 as a function of the number of factors.In the RFM, correlation estimates converge toward the exact value when the number of factors is increased, however, convergence is less rapid than in analysis shown in Fig. 3A.This is a result of the fact that error can be in either direction in the RFM.Compared with PCA, correlation estimates in the RFM converge significantly more rapidly toward the exact value.Since error is always in the same direction in PCA, there are no differences between absolute error and relative error in PCA-based analyses.
The RFM provides a more accurate description of correlation coefficients than PCA, when the number of factors is less than about 500.Noise inherent in the random factor model has the consequence that the error in correlation estimates does not disappear in the RFM even with the full set of variables even though median estimate rapidly converges toward the observed correlation.(A) Median error (solid gray curves; measured in percentage points) in correlation coefficient estimates in all pairs in the data-set estimated using 1,000 different random factor models, together with the 25th and 75th (dashed gray curves) percentiles of error in correlation estimates.The results are compared with the estimates of correlation based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.The results are shown as a function of the number of factors (abscissa).(B) Median absolute error of the random factor model (solid grey curve; measured in percentage points), together with the 25th and 75th percentiles (dashed grey curves); Median error in PCA (solid black curve) and the 25th and 75th percentiles (dashed black curves).
The cross-over to regime where PCA is more accurate occurs around 700-800 factors (Fig. 3).When the number of factors is very high, PCA gives as good as or better correlation coefficients than the median estimate from the RFM.A factor model with this large number of factors is of little use in practical applications.

Covariance
The median error in covariance estimates converges rapidly toward zero in the RFM.The 25th and 75th percentiles form a funnel that converges toward zero when the number factors increases (Fig. 4A).Despite the fact that PCA is worse than the RFM in reproducing correlation coefficients, PCA gives a better reproduction of the covariance matrix (Fig. 4B).

Impact of the market factor
The risk in the equity market is often dominated by a single factor known as the market risk factor (e.g., Sharpe, 1964).To better analyze the other possible risk factors, we subtract the first principal component, corresponding to the market risk factor, from the data and reanalyze the remaining data (the "reduced data").
Fig. 5A shows that PCA becomes more accurate in reproducing the correlation coefficients when the impact of the market risk factor is removed from the data.Perhaps more surprisingly, reproduction of data structure becomes equally accurate in the RFM and in PCA with respect to both error measures in the correlation coefficient (Fig. 5A and 5B).This suggests that the RFM and PCA contain equal amounts of information about the correlations.
As a further check, we generated random data by sampling the normal probability distribution N (0, 1) repeatedly.Fig. 6 shows that the accuracy of both the RFM and PCA is almost identical in this case.Comparison with Figures 5A and 5B shows that the accuracy of reproduction of the "reduced" Russell correlations does not significantly differ from the accuracy of the random data.This indicates that the fluctuations around the market risk factor are largely a product of independent "noise" contributions.
Removal of the market risk factor from the data also influences the accuracy of covariance reproduction.PCA is again more accurate than the RFM in covariance reproduction (Fig. 5C  (A) Median error (solid gray curves; measure in percentage points) in covariance in all pairs in the data-set estimated using 1,000 different random factor models, together with 25th and 75th (dashed gray curves) percentiles of error in covariance estimates.The results are compared with the estimates of covariance based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.The results are shown as a function of the number of factors (abscissa).(B) Median absolute error (solid gray curves; measure in percentage points) in covariance in all pairs in the data-set estimated using 1,000 different random factor models, together with the 25th and 75th (dashed gray curves) percentiles of error.The results are compared with the estimates of covariance based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.

and 5D
).In this case, the median error of covariance matrix reproduction does not deviate from zero in the RFM, and the 25th and 75th percentiles are almost symmetrically around x-axis.

Universality
A number of probability distributions have been found useful in the random projection method (e.g., Achlioptas (2003); Kaski (1998)).Matoušek (2008) found almost any probability distribution with zero mean, unit variance, and subgaussian tail fulfills the requirements of the Johnson-Lindenstrauss theorem.These findings suggest that it not matter much which probability distribution is used in the random projections.To find out whether this is the case in an RFM, we reanalyze the data using RFMs based on six different probability distributions.We have also discussed some lowest order effects of varying the probability distribution, as well as reasons why deviation from a Gaussian distribution leads only to small corrections, in Remark A.4 after the proof in the Appendix.

Probability distributions
The six probability distributions that we employ here are two sparse matrix models of Achlioptas (2003), a column-normalized Gaussian model, a row-normalized Gaussian model, the baseline Gaussian model (defined in Sec.2.4) and a uniform model.In each case, the probability distribution is symmetric with respect to the origin and such that the expectation is zero.Each probability distribution also has a subgaussian tail.These RFMs differ from the baseline Gaussian RFM only by the construction of the random projection matrix B, and by the normalization.

Coin-flipping distributions
The simplest specification for random projection is the "random coin-flipping" algorithm of Achlioptas (2003).It is defined by choosing each element B pq of matrix B independently according to rule: set B pq = +1 with probability 0.5 and set B pq = −1 with probability 0.5.The second random projection that Achlioptas (2003) proposes is based on a more sparse projection matrix defined by: set B pq = +1 with probability 1/6, set B pq = 0 with probability 2/3 and set B pq = −1 with probability 1/6.Again each element is chosen independently of the other elements.Based on these random projections, we can define two RFMs.

Gaussian and uniform distributions
In addition to the baseline Gaussian RFM, we analyze two different RFMs based on the normal distribution.In the first RFM, matrix B is based on the spherical uniform distribution.The elements of matrix B are defined by where z ml ∼ N (0, 1) are indendent and Z = p |z pl | 2 .In this RFM, columns of matrix B are normalized in such a way that their length is exactly one.
Due to normalization of the columns of matrix B, diagonal elements of matrix B T B ∈ R d×d behave as in an orthogonal matrix.Then (B T B) mm = 1 for all m = 1, 2, . . ., k. Non-diagonal elements of B T B have zero expectation and variance proportional to 1/d (Kaski, 1998).Hence, non-diagonal elements of B T B are approximately distributed according to zero-mean normal distribution at a relatively low dimension.Therefore B T B = 1 + ǫ, where ǫ ∈ R d×d has non-zero elements only on off-diagonal, [ǫ] = 0 and The second RFM based on the Gaussian probability distribution is a variation on the theme: Instead of column-normalization in the first model, rows of projection matrix B are normalized to unit length.This is the only difference between the two RFMs, but it is sufficient to require a different normalization constant.
The sixth considered RFM is defined by projection matrix B, which is based on the continuous uniform probability distribution.Each element in the projection matrix B is chosen independently from the uniform distribution on interval [−1, 1], that is, B mn ∼ U (−1, 1) for each m, n.

Universality of distributions
Fig. 7 shows that all six RFMs produce almost equally accurate results.To reduce noise, Fig. 7 shows results averaged over 50 sample runs.When the number of factors exceeds 10, all RFMs produce almost identical median accuracy.The only deviation is the column-normalized Gaussian model, which deviates from the other RFMs when the number of factors is less than 5.All the other RFMs produce identical results also in this regime.The accuracy of the 25th and 75th percentiles mainly depends on the number of factors, not much on the way factors are generated.
The results suggest that the details of how the projection matrix is specified are not that important.Almost any sufficiently regular construction of the random projection matrix (when properly normalized) produces a factor model, which preserves the approximate correlation structure.The main requirement here seems to be that matrix elements are chosen randomly and independently of other matrix elements.This supports the view that the RFM represents quite well how the bulk of factor models would describe the analyzed task.

Randomness of factors
We set out to analyze the impact of random selection of factors on a linear factor model.We were interested in whether and how randomness in the choice of factors impacts the reproduction of long equity time-series and, in particular, whether their interdependence is preserved.We found that accuracy of a typical random factor model is respectable, especially the correlation matrix is well-preserved in the reproduction of time-series (Section 4).We also derived novel theoretical results on the accuracy of a random factor model (Appendix A).
It may seem unlikely that a factor model with randomly-chosen factors could be used for any kind of factor modeling.One of the reasons for the ability of an RFM to capture the details of an equity time-series resides in the fact that random factors are, as a consequence of independence of elements, almost orthogonal to each other.Furthermore, the number of almost orthogonal vectors is higher in a higher-dimensional space8 , which reduces the impact of the "curse of dimensionality" (Bellman, 1957;Indyk and Motwani, 1998) and thereby makes data represention more feasible.A suitably high number of random factors will then span a subspace sufficient to capture the return time-series at the desired accuracy.
On the other hand, the number of factors is not bounded in a random factor model.Only in the limit of infinite number of factors, an RFM is ensured to reproduce the original time-series perfectly.This can be viewed as a disadvantage of using an RFM.

Universality
In a classical factor model, only a few factors are statistically significant.Then, explanatory power of each factor should be large.In a statistical factor model, a larger number of factors is often used, which also has the consequence that a larger ambiguity in the choice of factors is encountered (Ledoit and Wolf, 2004).Several different sets of factors may provide almost equally good fit to the data.In an RFM, each factor has only a small explanatory power, which suggests that a large number of factor sets provide essentially equally good descriptions of the data and its structure.This was observed in our computational experiments.
The number of random factors seems to be more important than fine-tuning of random factor time-series.The way an RFM is constructed is not important as long as elements of projection B are independently drawn from a suitably regular probability distribution with zero expectation and subgaussian tails.Regardless of the probability distribution used, we obtained almost identically accurate results.These findings suggest that a kind of universality of RFMs is present, at least with respect to correlation coefficients.The results are largely dominated by a set of typical RFMs that have a rather similar accuracy of data reproduction.We have called this set of factor models the bulk.
The analysis of the proof of Theorem A.1 (see Remark A.4) supports the view that universality is present with respect to probability distributions.The assumption that probability distribution is Gaussian is not necessarily required in the theorem.It suffices to assume independence of the random matrix elements, and it is likely that this requirement can be relaxed further.

Accuracy, revisited
In our analyses, we employed PCA as a yardstick against which we compared the RFM.The RFM described correlation structures and volatility well, but individual data points of time-series were reproduced less accurately.PCA reproduced individual data points of time-series more accurately, but reproduction of cross-correlations of the time-series was not that good mainly due to underestimated volatility.In other words, the RFM preserved the structure of the data but not necessarily the details of single time-series, while PCA representation preserved the details but not necessarily the correlation structure.
It is worth pointing out that PCA is fitted to preserve co-variance of time-series, not correlations.An RFM is fitted in no way to the data, so the preservation of correlations is quite unexpected.
The previous literature has compared the performance of the random projection method with PCA, and found results similar to ours.Bingham and Mannila (2001) found that random projection method performed significantly better than PCA in the compression of image data and in text clustering.Goal et al. (2005) found that random projection compares favorably with PCA, although PCA is more accurate with small number of dimensions.Tang et al. (2005) found that in text clustering a PCA-based method provides better accuracy with small number of dimensions, while with high number of dimensions the random projection method dominated.Deegalla and Bostrom (2006) found that in five image data sets and five micro array data sets, PCA dominated with a small number of dimensions but its performance deteriorated when the dimensions of the data increased (cross-over occurs at 15-150 dimensions depending on the data set), while random projection dominates at high number of dimensions.Our findings are consistent with the results of these previous studies.
The realm of RFMs is the domain of huge data sets consisting of large number of long timeseries.An RFM answers the question: how many factors will a generic linear factor model require to describe the data at a specific accuracy.
For any m = 1, 2, . . ., d, the definitions yield Since B jm are i.i.d.centered, normalized Gaussian, this implies where ½ {n=m} stands for an indicator function having value 1, when n = m, and 0 otherwise.Thus The usefulness of Wick polynomials lies in the property that their products satisfy the same moments-to-cumulants expansion as simple products, with the additional rule that any partition with a cluster of indices inside one of the Wick polynomials will be missing from the expansion.For instance, for any random variables x 1 , x 2 , x 3 , x 4 -which need not be independent nor Gaussianone has where κ denotes a cumulant.Here, for instance, κ[x 1 , x 3 ] = Cov(x 1 , x 3 ).Applying this to the B-variables yields since for Gaussian random variables the fourth cumulant is equal to zero.Therefore, by ( 20) and ( 22), we have and thus, in particular, Therefore, we have now proven the first two items of the Theorem.The combinatorics gets progressively heavier in the remaining two items.Let us begin with the scalar product where akv m denote the centered variables.Taking an expectation and using (23) for m ′ = m thus yields The definition of C reads explicitly To compute its expectation, we still need to evaluate Therefore, where in the last step we have used the identity u For the final result, let us assume in addition that µ u = µ v = 0. To avoid iterated Wick polynomials, let us begin with and express the two terms separately in Wick form.Namely, now where the product of four B-factors can be expanded using Wick polynomial expansion (17).Since only expectations of products of even number of B:s can be non-zero, we obtain where we have applied the assumptions µ u = 0 = µ v .Similarly, since and by ( 31) we obtain a Wick polynomial expansion : This concludes the proof of the theorem.
Remark A.3 The exact bound in (46) can also be approximated in other ways.Choosing the normalization as in the Corollary, with a 2 = 1/(k(d + k)), and assuming d ≫ k, we obtain an estimate Var(C) 2σ 2 u σ 2 v /k with a reduction of the prefactor from 8 to 2. The bound stated in the Theorem becomes optimal in the opposite regime, when k ≫ d.

Remark A.4
The assumption about sufficiently fast decay of correlations, here taken to be i.i.d., between the matrix elements is important for the above phenomena to occur.However, the precise statistics of the distribution of each matrix element plays much less a role.For instance, consider instead of Gaussian N (0, 1)-distributed matrix elements taking them from some other distribution which has finite moments up to order four.

Figure 1 :
Figure 1: An example reproduction of logarithmic return time-series using the random factor model (dark grey curves) and PCA (dashed light grey curves) compared with the data (solid black curves) using (A) 10 factors, (B) 100 factors, and (C) 500 factors.The data is normalized to have zero average return, as describes in the text.

Figure 2 :
Figure 2: Accuracy of time-series representations.(A) Error in time-series reproduction using the random factor model (dashed gray curve) and PCA (black solid curve) measured by RMSE.Curves are shown as functions of the number of factors.(B) Error in reproduction of time-series volatility using the random factor model (dashed gray curve) and PCA (black solid curve) as a function of the number of factors.Errors are relative to the volatility of the time-series due to normalization.

Figure 3 :
Figure3: Accuracy of the correlation modeling.(A) Median error (solid gray curves; measured in percentage points) in correlation coefficient estimates in all pairs in the data-set estimated using 1,000 different random factor models, together with the 25th and 75th (dashed gray curves) percentiles of error in correlation estimates.The results are compared with the estimates of correlation based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.The results are shown as a function of the number of factors (abscissa).(B) Median absolute error of the random factor model (solid grey curve; measured in percentage points), together with the 25th and 75th percentiles (dashed grey curves); Median error in PCA (solid black curve) and the 25th and 75th percentiles (dashed black curves).

Figure 4 :
Figure4: Accuracy of covariance estimation.(A) Median error (solid gray curves; measure in percentage points) in covariance in all pairs in the data-set estimated using 1,000 different random factor models, together with 25th and 75th (dashed gray curves) percentiles of error in covariance estimates.The results are compared with the estimates of covariance based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.The results are shown as a function of the number of factors (abscissa).(B) Median absolute error (solid gray curves; measure in percentage points) in covariance in all pairs in the data-set estimated using 1,000 different random factor models, together with the 25th and 75th (dashed gray curves) percentiles of error.The results are compared with the estimates of covariance based on PCA (solid black curve), together with the 25th and 75th (dashed black curves) percentiles.

Figure 5 :
Figure 5: Analysis of the reduced data set in which the influence of the market is removed.(A) Error in correlation coefficient, (B) Absolute error in correlation coefficient estimates, (C) Error in covariance, and (D) Absolute error in covariance estimates in an random factor model (gray curves) and in PCA (black curves).Solid lines are median estimates, dashed lines 25th and 75th percentiles.

Figure 6 :
Figure 6: Reproduction of correlation coefficient in randomly-generated data.(A) Error in correlation reproduction of the random data using the random factor model (dashed gray curve) and PCA (black solid curve) .Curves are shown as functions of the number of factors.Differences are in percentage points.(B) Absolute error in correlation reproduction of the random data using the random factor model (dashed gray curve) and PCA (black solid curve) .

Figure 7 :
Figure 7: Comparison of six different projection matrix specifications.Solid lines are median estimates, dashed lines the 25th and 75th percentiles.(A) Error in correlation coefficient estimates as functions of the number of factors in six models.Error is computed from the entire set of correlation pairs in 1,591 time-series.(B) Absolute error in correlation coefficient estimates in the six models as functions of the number of factors.

[
m ] = akµ u As [B jm ] = 0, centering the variable Z m yields the following simple Wick polynomial expansion:Z m − [Z m ] = a jm B jn : .

::
B jm B jn B j ′ m B j ′ n ′ : B jm : .(35)Combining the above results together finally yields a Wick polynomial expansion for the centered C:B jm B jn B j ′ m B j ′ n ′ : B jm B jn B j ′ m ′ B j ′ n ′ : B jm ′ : .(36)We use this formula to compute Var(C) = [(C − [C]) 2 ].In the expanded formula terms containing a product of different degree Wick polynomials yield zero since whatever three pairings is used for the six B-factors, one of these pairings connects two elements inside the degree four Wick polynomial.Hence, for instance, [:B j1m1 B j1n1 B j ′ 1 m ′ 1 B j ′ 1 n ′ 1 : :B j2m2 B j2m ′ 2 :] = 0.The products of second order terms turn out to yield the dominant contribution.After first taking out a factor a 4 d −2 (d − 1) −2 , it reads explicitlyd(2k + d + 1)

Table 1 :
The central parameters used.