**Statistical Risk Models**

__Zura Kakushadze __

Quantigic Solutions LLC; Free University of Tbilisi

__Willie Yu __

Centre for Computational Biology, Duke-NUS Medical School

February 14, 2016

**Abstract: **

We give complete algorithms and source code for constructing statistical risk models, including methods for fixing the number of risk factors. One such method is based on eRank (effective rank) and yields results similar to (and further validates) the method set forth in an earlier paper by one of us. We also give a complete algorithm and source code for computing eigenvectors and eigenvalues of a sample covariance matrix which requires i) no costly iterations and ii) the number of operations linear in the number of returns. The presentation is intended to be pedagogical and oriented toward practical applications.

### Statistical Risk Models – Introduction

Multifactor risk models4 are a popular risk management tool, e.g., in portfolio optimization. For stock portfolios, in their most popular incarnations, multifactor risk models are usually constructed based on industry and style risk factors.5 However, in some cases such constructions are unavailable, e.g., because any industry classification (or similar) is lacking, any relevant style factors are impossible to define, etc. In fact, this is generally the case when the underlying returns are not for equities but some other “instruments”, e.g., quantitative trading alphas (expected returns).

In such cases one usually resorts to statistical risk models. Often times these are thought of in the context of principal components of a sample covariance (or correlation) matrix of returns. More generally, one can think of statistical risk models as constructed solely based on the time series of the underlying returns and no additional information. The purpose of these notes is to provide a simple and pedagogical discussion of statistical risk models oriented toward practical applications.

In Section 2 we set up our discussion by discussing the sample covariance matrix, generalities of factor models, the requirement that factor models reproduce in-sample variances, and how a K-factor statistical risk model can be simply constructed by starting from the sample covariance (or correlation) matrix, writing down its spectral representation via principal components, truncating the sum by keeping only the first K principal components, and compensating for the deficit in the variances (i.e., on the diagonal of the resultant matrix) by adding specific (idiosyncratic) risk. This (generally) results in a positive-definite (and thus invertible) risk model covariance matrix so long as K < M, where M + 1 is the number of observations in the time series. This holds even if M < N, in which case the sample covariance matrix is singular. In fact, one of the main motivations for considering factor models in the first instance is that in most practical applications M < N (and often M << N), and even if MN, in which case the sample covariance matrix is nonsingular, it is still out-of-sample unstable unless M >> N, which is seldom (if ever) the case in practice. Factor models are intended to reduce this instability to a degree.

The beauty of the statistical risk model construction is its simplicity. However, one must fix the number of risk factors K. We discuss two simple methods for fixing K in Section 3 (with variations). One is that of (Kakushadze, 2015d). Another, very different looking method, is based on our adaptation of eRank (effective rank) of (Roy and Vetterli, 2007) and yields results similar to (and further validates) that of (Kakushadze, 2015d). We use intraday alphas of (Kakushadze, 2015a) and backtest these methods out-of-sample. The method of (Kakushadze, 2015d) backtests better. We give R source code for computing a K-factor statistical risk model with K fixed via the aforementioned two methods (with variations) in Appendix A.

In Section 4 we discuss how to compute principal components based on the returns. The “naive” method is the power iterations method, which is applicable to more general matrices. However, it requires iterations and is computationally costly. Because here we are dealing with sample covariance matrices, there is a simpler and faster way of computing principal components when M N that does not require any costly iterations and involves only operations. We discuss this method in detail in Section 4 and give R source code for it in Appendix C. The main purpose of this exercise is to set up our further discussion in Section 4, where we explain that statistical risk models are simply certain deformations of the sample covariance matrix. We then also discuss “nontraditional” statistical risk models such as shrinkage (Ledoit and Wolf, 2004), which are also deformations of the sample covariance matrix, but involve M principal components as opposed to K < M principal components. Generally, “nontraditional” models underperform.

We then take this a step further and explain that optimization using a statistical risk model is well-approximated by a weighted regression, where the regression is over the factor loadings matrix (i.e., the K principal components), and the weights are inverse specific variances. More precisely, this holds when the number of underlying returns N >> 1, which is the case in most applications. In fact, optimization reduces to a weighted regression for N >> 1 in a wider class of risk models that lack any “clustering” structure (we clarify the meaning of this statement in Section 4).

See full PDF below.