**August 18, 2015**

**by Michael Edesess and Kwok L. Tsui**

A paper by Campbell R. Harvey, Yan Liu, and Heqing Zhu with the unexciting title “…and the Cross-Section of Expected Returns” has been drawing a lot of attention recently. This is not only because of the paper’s startling revelation that at least 315 “factors” of investment returns have been discovered through multiple regressions and published in hundreds of articles and working papers. That phenomenon was already characterized in a 2011 paper by John H. Cochrane and called a “zoo” of factors. It is also because Harvey et al.’s work sounds the same alarm for finance research that was sounded in 2005 for medical research in a famous article by John P. A. Ioannidis, “Why Most Published Research Findings Are False.”

Harvey et al. claim that “most claimed research findings in financial economics are likely false.” I’ll explain how they arrived at that conclusion by looking at the ongoing search for factors that influence investment returns. Let’s begin by understanding the statistical tests that are used to determine whether or not a purported factor is the result of random “luck.”

[drizzle]

**What are factors?**

A factor is any item of data that can be put into correspondence with the rates of return on a security or a portfolio of securities.

Let’s call that item of data *x*, and let’s call the monthly return on the security or securities *r*. Imagine two columns of numbers: the values of *x* and the values of *r*. Each value of *x*, the factor, corresponds to a value of *r*, a monthly rate of return. The value of *x* and the value of *r* might both be for the same month or for the same security. Or *x* and r might be for different months.

A factor could be a macroeconomic variable, like monthly inflation, interest rates or unemployment. In that case, a monthly return on the portfolio or security could be put into a correspondence with, say, the value of the macroeconomic variable in the preceding month. Imagine a column of monthly returns on a portfolio next to a column of unemployment rates in the previous month.

Alternatively, a factor could be a feature of the corporate security itself or portfolio of securities. For example, a factor could be the size of the corporation, the corporation’s price/earnings ratio or book value/market value ratio. Imagine a column of rates of return on all stocks in a given month next to a column of the price/earnings ratio of the company issuing the stock.

Whatever the factor is, the idea is to see if it bears a numerical relation to the corresponding rates of return when the two variables are lined up that way. A correlation coefficient would be one way to see if the data exhibits a numerical relationship. An equivalent method – one that has become standard – is to run a regression of the rates of return on the “factor” variable.

A regression is what is known to high school students – at least it was in my high school – as a least-squares fit. It results in a line with a slope and a y-intercept (Figure 1).

**Figure 1.**

The slope, often labeled with a “beta,” is supposed to represent the relationship. If the beta is zero, there is no apparent relationship (at least not a linear one). If beta is large – either a large positive or a large negative slope – there is a relationship.

[/drizzle]