Campbell R. Harvey
Duke University – Fuqua School of Business; National Bureau of Economic Research (NBER)
Duke University – Fuqua School of Business
May 17, 2014
Hundreds of papers and hundreds of factors attempt to explain the cross-section of expected returns. Given this extensive data mining, it does not make any economic or statistical sense to use the usual significance criteria for a newly discovered factor, e.g., a t-ratio greater than 2.0. However, what hurdle should be used for current research? Our paper introduces a multiple testing framework and provides a time series of historical significance cutoffs from the first empirical tests in 1967 to today. We develop a new framework that allows for correlation among the tests as well as publication bias. We also project forward 20 years assuming the rate of factor production remains similar to the experience of the last few years. The estimation of our model suggests that today a newly discovered factor needs to clear a much higher hurdle, with a t-ratio greater than 3.0. Echoing a recent disturbing conclusion in the medical literature, we argue that most claimed research findings in financial economics are likely false.
Our key results are summarized:
Forty years ago, one of the first tests of the Capital Asset Pricing Model (CAPM) found that the market beta was a significant explanator of the cross-section of expected returns. The reported t-ratio of 2.57 in Fama and MacBeth (1973) comfortably exceeded the usual cutoff of 2.0. However, since that time, hundreds of papers have tried to explain the cross-section of expected returns. Given the known number of factors that have been tried and the reasonable assumption that many more factors have been tried but did not make it to publication, the usual cutoff levels for statistical significance are not appropriate. We present a new framework that allows for multiple tests and derive recommended statistical significance levels for current research in asset pricing.
We begin with 312 papers that study cross-sectional return patterns published in a selection of journals. We provide recommended p-values from the first empirical tests in 1967 through to present day. We also project minimum t-ratios through 2032 assuming the rate of “factor production” remains similar to the recent experience. We present a taxonomy of historical factors as well as definitions.
Our research is related to a recent paper by McLean and Pontiff (2013) who argue that certain stock market anomalies are less anomalous after being published. Their paper tests the statistical biases emphasized in Leamer (1978), Ross (1989), Lo and MacKinlay (1990), Fama (1991) and Schwert (2003).
Our paper also adds to the recent literature on biases and inefficiencies in cross-sectional regression studies. Lewellen, Nagel and Shanken (2010) critique the usual practice of using cross-sectional R2s and pricing errors to judge the success of a work and show that the explanatory powers of many previously documented factors are spurious.3 Balduzzi and Robotti (2008) challenge the traditional approach of estimating factor risk premia via cross-sectional regressions and advocate a factor projection approach. Our work focuses on evaluating the statistical significance of a factor given the previous tests on other factors. Our goal is to use a multiple testing framework to both re-evaluate past research and to provide a new benchmark for current and future research.
There are limitations to our framework. First, should all factor discoveries be treated equally? We think no. A factor derived from a theory should have a lower hurdle than a factor discovered from a purely empirical exercise. Nevertheless, whether suggested by theory or empirical work, a t-ratio of 2.0 is too low. Second, our tests focus on unconditional tests. It is possible that a particular factor is very important in certain economic environments and not important in other environments. The unconditional test might conclude the factor is marginal. These two caveats need to be taken into account when using our recommended significance levels for current asset pricing research.
While our focus is on the cross-section of equity returns, our message applies to many different areas of finance. For instance, Frank and Goyal (2009) investigate around 30 variables that have been documented to explain capital structure decisions of public firms. Welch and Goyal (2004) examine the performance of a dozen variables that have been shown to predict market excess returns. These two applications are ideal settings to employ multiple testing methods.
Our paper is organized as follows. In the second section, we provide a chronology of the “discovered” factors. The third section presents a categorization of the factors. Next, we introduce some multiple testing frameworks and suggest appropriate cutoffs for both past and future asset pricing tests. Some concluding remarks are offered in the final section.