Greenline Research: Why Data mining is a huge risk with risk factor-based investment strategies

Updated on


Article by Maneesh Shanbhag, CFA – Greenline Research

Executive Summary

  • Data mining is a huge risk with risk factor-based investment strategies. Many factors have proven to not work in practice and even the most popular factors, like Value and Momentum, may prove less effective going forward
  • Crowding in factor strategies, changes in the economy, and new business models may eliminate any potential excess return from simple screening metrics that form the basis for many factors
  • Investors can avoid being fooled by backtests by always keeping in mind that most attempts to beat markets will fail because trading is a zero-sum activity

Get The Timeless Reading eBook in PDF

Get the entire 10-part series on Timeless Reading in PDF. Save it to your desktop, read it on your tablet, or email to your colleagues.

Q1 hedge fund letters, conference, scoops etc, Also read Lear Capital


We believe there are cause and effect relationships in the world and in investing that hold true over time. Many are common sense and easily observable - like fire creates smoke - while others are harder to see and understand. With investing, true relationships can be hard to see because of randomness and noise in data, and there’s a risk we convince ourselves certain relationships exist that really do not (e.g. smoke creates fire). In much of quantitative finance, data is mined to show a certain effect, but the logic behind the cause and effect relationship is not robust. Then suddenly, because of evidence in noisy historical data, investors begin to believe that smoke creates fire. For us, when historical evidence disagrees with our logic, we always favor applying our fundamental understanding over what a backtest prescribes.

Factor investing is an area we have researched and written about extensively1,2,3. It is an approach to active management that is lower cost and backed by decades of historical data, compared to the standard high cost, overdiversified approach that has failed. But it is still active management, and with active management there is a loser for every winner. That’s a fact of markets. Normally the few winners in active management leverage a few key insights that are not recognized by the masses, at least at the time. In contrast, most active management losers tend to copy each other, using the same strategies and managers.

Factor investing is, by its nature, transparent and therefore easily copied. This is why many factor investing strategies are increasingly concerning to us. Data mining, factor crowding, as well as economic changes are all reasons why such strategies may disappoint in the future. We use popular Value and Momentum strategies as examples throughout this thought piece to illustrate. Keep in mind, we are not trying to definitively say that such factor strategies do not work, but instead hoping potential users of these strategies will pause and ask deeper questions about them. In the end, we can never forget the unavoidable fact that trading and beating markets is a zero-sum game.

Data mining is a risk even with Value and Momentum strategies

Value is the buying of “cheap” assets, at least based on measures such as a low price-to-earnings (P/E) ratio for stocks. This is the opposite of Growth or high P/E stocks, which are statistically expensive. Typically, the way a stock becomes relatively cheap is by underperforming in the recent past, and vice versa for Growth stocks. Momentum strategies can be compared similarly. Past winners (Momentum stocks) is about buying stocks that have recently outperformed on the basis their trend will continue. The opposite is Past Losers (or low momentum). The table below summarizes these four strategies and how they logically relate to each other.

But the logic explained above is at odds with the academic research backed by decades of market data across time and geography. The below summarizes the difference between what the backtests say versus logic. Both cannot simultaneously be true. Smoke cannot both be created by fire (logical) and produce it (evidence from backtests).

Value And Momentum Strategies

If our logic is correct, then we should see that Value is highly correlated to Past Losers and same for Momentum to Growth. To test, we simulate portfolios over the last 25 years where Value is defined as the ⅓ of the market with lowest P/E, Growth is ⅓ of the market with the highest P/E, Past Winners (high momentum) is the ⅓ of the market with best trailing 200-day returns, and Past Losers is the ⅓ of the market with the worst trailing 200-day returns. We use the Bloomberg equity database to simulate but our results are similar to what we find in the commonly used Ken French Data Library. Some might ask why test over only 25 years when there is more data available. Markets and economies change and adapt to new information. While we have tested these strategies going back to the 1920’s, more recent data is more relevant to today’s market conditions, which we also discuss later in this paper.

The charts below compare the returns over rolling 12-month periods for Value versus Past Losers, and Momentum versus Growth. To isolate the impact of Value, Growth, Momentum, and Past Losers, we subtract returns of the S&P 500 from them to isolate their excess return. Two things stand out. First is that our logic appears correct. There is a high correlation between Value and Past Losers most of the time, as well as with Growth to Momentum. This partly explains why Value and Momentum are negatively correlated: they select for opposite segments of the market (i.e. Value and Growth should be negatively correlated). Second is that most of the return difference between the strategies was during a single, isolated market event, the bubble and bust of 1998-2002. This was an anomalous period in markets when any strategy that was underweight technology stocks outperformed. Outside of this period, these strategies mostly delivered tracking error relative to a broad index like the S&P 500.

Value And Momentum Strategies

Selecting a stock based on a high P/E ratio is clearly different than selecting one based on its recent past performance. That is the point here. Despite these differences, the nature of their excess returns is similar and therefore concerning. For example, Past Winners selects solely based on past price moves and is a high turnover strategy (4-8x/yr). Value, on the other hand, selects based on accounting measures, like earnings, and is relatively low turnover (<1x/yr).

As an aside, this suggests it may be better to define the Value Premium as long low P/E stocks (Value) and short Past Losers, instead of the traditional definition of long Value and short Growth. This alternative construction would extract any premium from selecting cheap assets based on valuation instead of past price performance. Similarly, the Momentum Premium may be better defined as long Past Winners and short Growth stocks. A topic for further study.

This pattern of factor strategies outperforming during extreme market periods also supports our prior research that they are best used as screens to help avoid losers4.

Combining Value and Momentum has episodic outperformance versus Value alone

If Momentum (Past Winners) is similar to Growth, then combining Value and Momentum should not be a good idea, as they will cancel each other out. Put another way, combining Value and Growth is like buying the whole market.

To illustrate, we simulate a portfolio that is a sequential combination of Value and Momentum (we know there are multiple techniques for combining both factors, but other methods give a similar conclusion). The chart below sorts the universe of 500 equities in the S&P 500 along their Value and Momentum scores to show how the Combined portfolio is derived.

Value And Momentum Strategies

The chart below compares the excess returns over the market of the Value + Past Winners to Value alone. While the cumulative returns of the combined factor portfolio look amazing, we can see that its returns were largely isolated to the period.

Value And Momentum Strategies

We therefore show the summary statistics below both since 1993, but also since 2003 to show that since the end of the period, the risk-adjusted performance of Value + Past Winners has actually been worse than using Value alone.

Again, our point here is only to provide food for thought on the topic of whether certain risk factor premia exist and what expectations one should use for their returns going forward. Based on their returns having largely been determined by a few historical events, a zero expected return for these factor premium would be a reasonable starting point. In the end, alpha is a zero-sum game, and never in history has a simple strategy, copied by the masses, generated meaningful outperformance. By definition, it cannot.

There are additional changes in the economy and markets that further weakens the basis for risk factor strategies, which we discuss next.

Past performance is not a guarantee of future results, especially when backtested

We purposely picked only the past 25 years for our simulations in the first section of this paper. This was around the time academics like Eugene Fama, Ken French, Joseph Lakonishok and others published the seminal papers on the Value and Momentum effects for the world to see. Since that time, many billions of dollars have been put to work in these strategies. Markets by their nature tend to price in new information and we cannot ignore this fact.

Crowding will reduce returns

Crowding with risk factor strategies is a topic we and others have commented on in the past5. One example of this is the elimination of the size premium. Logic says that smaller companies should, on average, outperform larger ones to compensate for their higher risk. But since institutional investors began making strategic allocations to this asset class in the early 1980’s, this size premium disappeared, as shown in the chart below of the excess return of small cap stocks minus large cap stocks.

Value And Momentum Strategies

This crowding effect will negatively impact all factors returns as now hundreds of billions of dollars or more are now deployed in these strategies. And we will only be able to definitively measure this impact many years down the road.

Changes in the economy will reduce the effectiveness of certain factor definitions

Just as markets adapt and evolve, so does business and the economy. As one example, the US economy has transitioned from being dependent on manufacturing and physical capital to one dependent on intangible knowledge capital in services and technology businesses over the last few decades. This means an increasing amount of the average company’s value is no longer reflected on their financial statements.

Today, intangibles like intellectual property and proprietary data, are of great value but generally not recorded on a balance sheet, reducing the effectiveness of a measure such as book equity value.

This change in our economy explains why the price-to-book equity (P/B) measure has been ineffective for selecting value companies for over 30 years. The chart below shows the cumulative returns of Value, as measured by low P/B, since 1965. We can see its performance relative to the market has become largely ineffective since the early 1980’s, something others have confirmed as well6.

Value And Momentum Strategies

Valuation is the anchor for markets, but it is complex. Using simple metrics like price-to-earnings ratios is only a shortcut, but one that has statistically worked, at least historically. In practice, any factor that measures the growth rate and quality of underlying cash flows should be factored into valuation. And as accounting rules change and as business models evolve, this will all determine the effectiveness of metrics that worked in the past.

To Avoid Mistakes, Have the Right Framework

In this era of big data and cheap computing power, it is easy for anyone to create a winning investment strategy in a backtest. But investing is forward looking and markets are adept at pricing in known information. Often, a strategy has a one-off experience that drives its outperformance, which is masked in summary statistics. We see this in risk factor strategies, even Value and Momentum. If these one-off events do not repeat, most factor strategies are at risk of being ineffective going forward.

Investing is complex, and there is always more randomness and more noise than signal in the data we observe. Managers who outperform, even over long periods like 10 years, often do not repeat their feat. We think applying a few logical principles to this complex problem can help cut through the noise. Two such principles we apply are that markets are hard to beat (since trading is a zero-sum), even before fees and trading costs; and always favor a logical explanation over statistics.

“Organized common (or uncommon) sense is an enormously powerful tool. There are huge dangers with computers. People calculate too much and think too little.” - Charlie Munger


Leave a Comment

Signup to ValueWalk!

Get the latest posts on what's happening in the hedge fund and investing world sent straight to your inbox! 
This is information you won't get anywhere else!