As we have mentioned before, here, here and here, there is overwhelming evidence that the number of stock anomalies in the universe is much lower than originally thought. Most of the previous research papers attempt to filter out past anomalies in the literature (generally over 300+) by applying more stringent standards, such as higher p-values or more advanced statistical tests.
- Fund of funds Business Keeps Dying
- Baupost Letter Points To Concern Over Risk Parity, Systematic Strategies During Crisis
- AI Hedge Fund Robots Beating Their Human Masters
A working paper we examine below, “p-hacking: Evidence from two million trading strategies” by Chordia, Goyal and Saretto, takes an alternative approach. The authors take the Compustat universe of data points, and use every variable in the dataset to create over 2 million trading strategies — explicit data-mining!
The idea behind the paper is to examine what is possible if one simply data-mined the entire universe of signals. The authors make an effort to only examine tradeable strategies by eliminating small and micro-cap stocks. In addition, the authors apply more stringent statistical standards (which I will discuss below) to identify the true anomalies in the data.
After examining all the signals, the authors find only a handlful of trading strategies that are “anomalous” and most of these strategies make no economic sense! Now the authors do assume (through their tests), that the Fama and French 5-factor model plus momentum explain the cross-section of stock returns (so all the classic characteristics we all argue about are controlled for in the study), but the author’s main contribution is that there is little to no evidence for additional anomalies.
However, many papers have already found this. So why is this paper important?
A newer topic that we are commonly asked about is machine learning. Many are intrigued by the idea — let the computer with its algorithms come up with the best trading signals using all the data. At first, this sounds great (and in certain contexts can be extremely useful). But taking a step back, we need to examine what happens if we simply examine all the data. This paper highlights that trading on every signal in the fundamental-signal universe yields almost no (additional) anomalies.(1)It may be the case that machine-learning is great at combining the already well-known anomalies that the authors assume in the paper (such as Value and Momentum); however, machine-learning may also end up enhancing frictional costs and increasing the chances of a data-mined result (despite the algorithms best efforts to avoid this problem).
Below we dig into the details of the paper.
The paper examines the idea of finding anomalies in a different manner than most — it simply data mines.
Here is the high-level summary from the paper:
We consider the list of all accounting variables on Compustat and basic market variables on CRSP. We construct trading signals by considering various combinations of these basic variables and construct roughly 2.1 million different trading signals.
Two additional screens that I like from that paper are that they (1) eliminate all firms with stock prices below $3 as well as those below the 20th percentile for market capitalization and (2) require all the variables to have information to include the firm in the sample.(2)
The paper then examines the 156 variables in the Compustat library (listed in Appendix A1 of the paper) to create over 2 million trading signals. Here is how the signals are constructed, directly from the paper:
There are 156 variables that clear our filters and can be used to develop trading signals. The list of these variables is provided in Appendix Table A1. We refer to these variables as Levels. We also construct Growth rates from one year to the next for these variables. Since it is common in the literature to construct ratios of different variables we also compute all possible combinations of ratios of two levels, denoted Ratios of two, and ratios of any two growth rates, denoted Ratios of growth rates. Finally, we also compute all possible combinations that can be expressed as a ratio between the difference of two variables to a third variable (i.e., (x1 ? x2)/x3). We refer to this last group as Ratios of three. We obtain a total of 2,090,365 possible signals.
Since the paper has already eliminated small and micro cap stocks from the tests, they form portfolios using a one-dimensional sort on each of the variables. The portfolios are rebalanced annually, creating long/short portfolios that go long the top decile on each measure, and short the bottom decile.
The paper tests these 2 million portfolios by (1) regressing the L/S portfolio returns against the Fama and French 5-factor model plus the momentum factor and (2) examining Fama-MacBath (FM) regressions.
The Tests and Results
Before getting into the specific results, a good exercise (especially when data-mining) is to simply examine the distribution of outcomes. Figure 1 (shown below) in the paper shows the distributions and t-stats.
As viewed from the distributions, most are centered around 0.(3)The question is as follows: how robust are these trading strategies with significant alphas and Fama-MacBeth coefficients?
Examining raw returns first, the paper finds 22,237 portfolios with T-stats above 2.57 (in absolute value) — this is less than 1% of the total portfolios. Next, the paper examines the 6-factor regressions and finds that around 31% of the sample has a significant alpha at that 5% level, and 17% of the sample are significant at the 1% level. Last, examining the Fama-MacBeth regressions, the paper finds similar results — 31% of the sample has a t-stat above 1.96, and 18% of the sample has a t-stat above 2.57.
Based on these independent tests (alphas and FM regressions), the results are promising. However, the authors dig into the statistics with more advanced tests.
The reason to do this, as we discussed here before, is that as the number of ideas (in our case, 2 million) increases, the probability of Type 1 Errors increases. The authors describe this well in their paper:
Classical single hypothesis testing uses a significance level ? \alpha to control Type I error (discovery of false positives). In multiple hypothesis testing (MHT), using ? \alpha to test each individual hypothesis does not control the overall probability of false positives. For instance, if test statistics are independent and normally distributed and we set the significance level at 5%, then the rate of Type I error (i.e., the