Google Searches And Stock Returns

Laurens Bijl
Norwegian University of Science and Technology (NTNU)

Glenn Kringhaug
Norwegian University of Science and Technology (NTNU)

Peter Molnár
Norwegian University of Science and Technology (NTNU)

Eirik Sandvik
Norwegian University of Science and Technology (NTNU)

March 30, 2016

International Review of Financial Analysis, Forthcoming

Abstract:

We investigate whether data from Google Trends can be used to forecast stock returns. Previous studies have found that high Google search volumes predict high returns for the first one to two weeks, with subsequent price reversal. By using a more recent dataset that covers the period from 2008 to 2013 we find that high Google search volumes lead to negative returns. We also examine a trading strategy based on selling stocks with high Google search volumes and buying stocks with infrequent Google searches. This strategy is profitable when the transaction cost is not taken into account but is not profitable if we take into account transaction costs.

Google Searches And Stock Returns – Introduction

Prediction of stock returns is possibly one of the most researched subjects in finance. However, researchers have failed to agree on two areas: whether or not it is possible to predict stock market movements, and second, what the implications of such predictability are for our understanding of the financial markets. The focus and views of researchers have changed over time. Early research was based on an efficient market hypothesis which claims that stock prices are driven by new information and therefore follow a random path as the occurrence of new information is random (Fama, 1965). Later research has examined the efficient market hypothesis in a more critical way (Ang and Bekaert, 2007; Burton, 2003; Campbell and Yogo, 2006; Cochrane, 2008; Lo and MacKinlay, 1988). Researchers have increasingly focused on the impact of investor sentiment (Baker and Wurgler, 2006; Barberis et al., 1998) and recent research started to utilize increasingly more available data from news articles (Tetlock, 2007), Twitter (Bollen et al., 2011), Wikipedia (Moat et al., 2013) and Google Trends (Damien and Ahmed, 2013; Preis et al., 2013; Preis et al., 2010).

Google records search data for all search terms that reach a certain amount of searches, and it is possible to download historical search indices over search terms through the Google Trends tool. Google search is by far most popular search engine on the Web. Several researchers have used Google Trends as a tool in their research in recent years, including research on spreading of epidemics and diseases (Carneiro and Mylonakis, 2009; Ginsberg et al., 2008; Pelat et al., 2009).

A few attempts have been made to forecast financial markets based on Google Trends data, but with mixed results. Preis et al. (2010) investigate the correlation between returns and search volume for company names, but they do not find any significant correlation. Instead they find strong evidence that Google search data can be used to predict trading volume. Preis et al. (2013) investigate whether general search terms related to finance can be used to predict market movements. They found that a strategy where a market portfolio is bought, or sold, based on the Google search volumes for certain keywords could outperform the market index by 310% over the 7 year period they investigated. Similar results were found by Moat et al. (2013) who use Wikipedia visitation statistics to predict stock returns. They show that a trading strategy based on the change in page views for the constituents of the Dow Jones Industrial Average can be used to create a trading strategy that outperforms the market index. They also apply this strategy to Wikipedia articles for more general financial keywords with similar results. Kristoufek (2013) studies the effect of Google search volumes (henceforth GSV) on portfolio diversification. He uses a diversification strategy based on penalizing stocks with high search volumes to create a portfolio that dominates the benchmark index as well as the equally weighted portfolio. The rationale behind the diversification strategy is an idea that search volume is correlated with stock riskiness. Damien and Ahmed (2013) seek to test the claims that GSV contains enough data to predict future financial index returns. They take a more stringent approach that eliminates several of the biases in the results of Preis et al. (2013). They find that strategies based on financial keywords do not outperform strategies based on completely unrelated keywords.

Google Searches And Stock Returns

We investigate whether search query data on company names can be used to predict weekly stock returns for individual firms. The results show that high GSV indeed predicts low future returns. The relationship is weak but robust and statistically significant. However, this effect is not strong enough to constitute a profitable trading strategy due to transaction costs. Two papers most related to this paper are Da et al. (2011) and Joseph et al. (2011). Both these papers find that a high GSV predicts high future returns for the first one to two weeks with subsequent reversal. However, these papers study the period from 2004 to 2008, whereas we use more recent data covering the 2008 to 2013 period.

This paper is structured as follows: Section 2 describes the datasets and our preliminary calculations. In Section 3 we describe our model, including an assessment of its robustness. In Section 4 we discuss the results and possible applications to a trading strategy. Section 5 concludes.

Google Searches And Stock Returns

Data

The data we use in this paper are obtained from Wharton Research Data Services (WRDS) and Google Trends. The data obtained from WRDS include daily open prices, volumes, dividends and the number of shares outstanding for companies in the S&P 500 index from January 1, 2007 through December 31, 2013. We analyze GSV data from 2008 to 2013 due to the lack of reliability in GSV prior to 2008 (Damien and Ahmed, 2013), but we need stock data from 2007 to calculate 52 week rolling betas and moving averages for the stocks in 2008. The GSV data we use are indices (with values from 0-100) for search volumes in the US for the names of companies in the S&P 500 from January 1, 2008 through December 31, 2013. We use companies in the S&P 500 index due to their size and because most of these companies have frequent data on Google Trends.

As a consequence of GSV being reported only monthly for search words with low search volume, some companies are removed from our dataset for consistency. In addition we only include companies that were in the S&P 500 at the end of 2013 and for which we have complete stock data back to 2007. This leaves us with a complete dataset on 431 companies.

Google Searches And Stock Returns

See full PDF below.