…And The Cross-Section Of Expected Returns

Campbell R. Harvey

Duke University – Fuqua School of Business; National Bureau of Economic Research (NBER)

Yan Liu

Texas A&M University, Department of Finance

Heqing Zhu

Duke University – Fuqua School of Business

February 3, 2015

Abstract:

Hundreds of papers and hundreds of factors attempt to explain the cross-section of expected returns. Given this extensive data mining, it does not make any economic or statistical sense to use the usual significance criteria for a newly discovered factor, e.g., a t-ratio greater than 2.0. However, what hurdle should be used for current research? Our paper introduces a multiple testing framework and provides a time series of historical significance cutoffs from the first empirical tests in 1967 to today. Our new method allows for correlation among the tests as well as publication bias. We also project forward 20 years assuming the rate of factor production remains similar to the experience of the last few years. The estimation of our model suggests that today a newly discovered factor needs to clear a much higher hurdle, with a t-ratio greater than 3.0. Echoing a recent disturbing conclusion in the medical literature, we argue that most claimed research findings in financial economics are likely false.

Our key results are summarized:

…And The Cross-Section Of Expected Returns – Introduction

Over forty years ago, one of the first tests of the Capital Asset Pricing Model (CAPM) found that the market beta was a significant explanator of the cross-section of expected returns. The reported t-statistic of 2.57 in Fama and MacBeth (1973, Table III) comfortably exceeded the usual cutoff of 2.0. However, since that time, hundreds of papers have tried to explain the cross-section of expected returns. Given the known number of factors that have been tried and the reasonable assumption that many more factors have been tried but did not make it to publication, the usual cutoff levels for statistical significance may not be appropriate. We present a new framework that allows for multiple tests and derive recommended statistical significance levels for current research in asset pricing.

We begin with 313 papers that study cross-sectional return patterns published in a selection of journals. We provide recommended test thresholds from the first empirical tests in 1967 through to present day. We also project minimum t-statistics through 2032 assuming the rate of factor production” remains the same as the last ten years. We present a taxonomy of historical factors as well as definitions.

Our research is related to a recent paper by McLean and Pontiff (2014) who argue that certain stock market anomalies are less anomalous after being published.2 Their paper tests the statistical biases emphasized in Leamer (1978), Ross (1989), Lo and MacKinlay (1990), Fama (1991) and Schwert (2003).

Our paper also adds to the recent literature on biases and inefficiencies in cross-sectional regression studies. Lewellen, Nagel and Shanken (2010) critique the usual practice of using cross-sectional R2s and pricing errors to judge success and show that the explanatory power of many previously documented factors are spurious. Our work focuses on evaluating the statistical signicance of a factor given the previous tests on other factors. Our goal is to use a multiple testing framework to both re-evaluate past research and to provide a new benchmark for current and future research. We tackle multiple hypothesis testing from the frequentist perspective. Bayesian approaches to multiple testing and variable selection also exist.3 However, the high dimensionality of the problem combined with the fact that we do not observe all the factors that have been tried poses a big challenge for Bayesian methods. While we propose a frequentist approach to overcome this missing data issue, it is unclear how to do this in the Bayesian framework. Nonetheless, we provide a detailed discussion of Bayesian methods in the paper.

Multiple testing has only recently gained traction in the finance literature. For the literature on multiple testing corrections for data snooping biases, see Sullivan, Tim-mermann and White (1999, 2001) and White (2000). For research on data snooping and variable selection in predictive regressions, see Foster, Smith and Whaley (1997), Cooper and Gulen (2006) and Lynch and Vital-Ahuja (2012). For applications of multiple testing approach in the finance literature, see, for example, Shanken (1990), Ferson and Harvey (1999), Boudoukh et al. (2007) and Patton and Timmermann (2010). More recently, a multiple testing connection has been used to study technical trading and mutual fund performance, see for example Barras, Scaillet and Werm- ers (2010), Bajgrowicz and Scaillet (2012) and Kosowski, Timmermann, White and Wermers (2006). Conrad, Cooper and Kaul (2003) point out that data snooping accounts for a large proportion of the return differential between equity portfolios that are sorted by firm characteristics. Bajgrowicz, Scaillet and Treccani (2013) show that multiple testing methods help eliminate a large proportion of spurious jumps detected using conventional test statistics for high-frequency data. Holland, Basu and Sun (2010) emphasize the importance of multiple testing in accounting research. Our paper is consistent with the theme of this literature.