Volatility Forecasting: The Role Of Internet Search Activity And Implied Volatility
West Virginia University – College of Business & Economics
West Virginia University – College of Business & Economics
Marketa Halova Wolfe
Skidmore College – Department of Economics
May 1, 2016
Growing literature has documented that Internet search activity is associated with volatility in the financial and commodity markets. We reexamine the role of the Internet search activity in the context of volatility prediction in these markets. We broaden the scope by including three traditional predictors (returns, trading volume and implied volatility) and by using not only in-sample but also out-of-sample analysis. We find that implied volatility plays a crucial role in evaluating the contribution of Internet search activity data. Our results show that the predictive role of the Internet search activity data disappears in the stock index and foreign exchange markets and substantially declines in the commodity markets once implied volatility is included in the benchmark model. This finding contributes to our understanding of what informational content is captured by the Internet search activity data. It appears to capture similar information as implied volatility.
Volatility Forecasting: The Role Of Internet Search Activity And Implied Volatility – Introduction
Growing literature has shown that Internet search activity is associated with volatility in the financial and commodity markets. This literature includes Vlastakis and Markellos (2012) analyzing individual stocks, Dimpfl and Jank (2016) and Dzielinski (2012) examining stock indices, Da, Engelberg and Gao (2015) studying stock indices, exchange traded funds and Treasury bonds, Goddard, Kita and Wang (2012) and Smith (2012) analyzing exchange rates, and Vozlyublennaia (2014) examining stock and bond indices, gold and crude oil.
We build on this literature in two ways. First, we broaden the analysis of volatility prediction using the Internet search activity by including three traditional predictors: returns, trading volume and implied volatility. Second, we expand the investigation from an in-sample analysis in previous literature to out-of-sample forecasting where we pay closer attention to the selection of the benchmark model for evaluation of predictors. This expansion in scope allows us to carefully compare and evaluate the information content of traditional predictors and the Internet search activity robust to not only in-sample but also out-of-sample analysis.
We use Google search volume data available at weekly frequency since January 2004. We include stock (S&P 500 and DJIA), foreign exchange (Euro and Canadian dollar), and commodity (gold and crude oil) markets. We build on seminal work by Andersen, Bollerslev, Diebold and Labys (2001) who propose measuring volatility as realized volatility computed by the realized standard deviation (using 5-minute continuously compounded returns) and subsequent research by Andersen, Bollerslev, Diebold and Labys (2003) and Andersen, Bollerslev and Meddahi (2004) who propose forecasting volatility by reduced-form models of realized volatility as they outperform models such as the generalized autoregressive conditional heteroskedasticity (GARCH) model.
In the in-sample analysis, we employ a vector autoregressive (VAR) model, Granger causality tests, and forecast error variance decomposition similarly to previous literature. In-sample results are often prone to pitfalls involving spurious associations and overfitting. Therefore, we contribute to the previous literature by out-of-sample evaluations that have been quite effective in reducing these in-sample problems. Here, the key out-of-sample evaluation concept is encompassing. It argues that if model 1 contains all relevant information for forecasting a target variable over model 2, forecast errors of model 1 should be close to forecast errors from model 2. Otherwise, model 2 provides additional information in the forecasts and is not encompassed by model 1. This is especially useful in our context because we want to examine the marginal contribution of different predictors. We begin with a simple four-lag autoregressive model of realized volatility (AR4). Against this benchmark we evaluate the marginal contribution of four predictors proposed in the previous literature: trading volume, returns, implied volatility, and Google search volume. We find that the AR4 model with implied volatility substantially outperforms the other models, which agrees with previous literature (for example, Christensen and Prabhala, 1998 and Busch, Christensen and Nielsen, 2011) employing implied volatility to forecast realized volatility. It is against this expanded benchmark model that we evaluate the other predictors (trading volume, returns and Google search volume). We find that the usefulness of Google search volume for forecasting realized volatility disappears in the financial markets and substantially declines in the commodity markets once implied volatility is included in the model. This result also obtains in the in-sample analysis.
This result contributes to our understanding of what informational content is captured by the Internet search activity data. Previous papers, for example, Da, Engelberg and Gao (2011), Goddard, Kita and Wang (2012), Vlastakis and Markellos (2012), and Vozlyublennaia (2014) discuss that the Internet search activity captures investor attention or information demand. Our results suggest that it captures some of the same information as implied volatility, which represents the market’s expectation of future volatility over the life of the options. For example, Neely (2005) analyzes news events around the largest changes in implied volatility of options on Eurodollar futures from 1985 to 2001. The stock market crash of 1987, President George H. W. Bush asking Congress for authority to oust Iraq from Kuwait, the Russian debt crisis, and two sharp declines in the U.S. trade deficit rank among the top five events. Developments in the financial markets and the U.S. monetary policy feature among other influential events. In this sense, the previous studies about usefulness of the Internet search activity for forecasting realized volatility are not misguided; the Internet search activity does likely reflect interest in acquiring more information about an assortment of events. However, perhaps because most of these internet searches come from the general public and often do not translate into trading in the financial and commodity markets, the effect of implied volatility subsumes the noisier effect of the Internet search activity.
Our approach is useful to understand the contribution of Internet search activity since its relations to implied volatility and other traditional predictors has not been comprehensively analyzed. Most papers on forecasting volatility with Internet search activity do not consider the role of implied volatility with two exceptions. In a single-equation OLS regression, Dzielinski (2012) finds that the Google search volume remains significant in-sample even after controlling for implied volatility in the S&P 500 from 2005 to 2009. We broaden this approach in two ways: first, by including other traditional predictors in an in-sample VAR framework, and second, by forecasting out-of-sample. Interestingly, Dimpfl and Jank (2016) briefly mention that the effect of Google search volume decreases but is not eliminated when implied volatility is added in their in-sample VAR analysis of DJIA realized volatility, trading volume and Internet search activity from 2006 to 2011. This sharply contrasts with our results where the usefulness of Google search volume disappears in the DJIA once implied volatility is included.
The remainder of this paper is structured as follows. Section 2 describes the data. Section 3 presents the methodology and empirical results. Section 4 concludes with a brief discussion of usefulness of Internet search activity data in fields beyond financial and commodity markets volatility