Google, Twitter Bullishness - Impact On Stocks: ECB Study

Quantifying The Effects Of Online Bullishness On International Financial Markets by European Central Bank

Get The Full Warren Buffett Series in PDF

Get the entire 10-part series on Warren Buffett in PDF. Save it to your desktop, read it on your tablet, or email to your colleagues

Abstract

Computational methods to gauge investor sentiment from commonly used online data sources that rely on machine learning classifiers and lexicons have shown considerable promise, but suffer from measurement and classification errors. In our work, we develop a simple, direct and unambiguous indicator of online investor sentiment, which is based on Twitter updates and Google search queries. We examine the predictive power of this new investor bullishness indicator for international stock markets. Our results indicate several striking regularities. First, changes in Twitter bullishness predict changes in Google bullishness, indicating that Twitter information precedes Google queries. Second, Twitter and Google bullishness are positively correlated to investor sentiment and lead established investor sentiment surveys. The former, in particular, is a more powerful predictor of changes in sentiment in the stock market than the latter. Third, we observe that high Twitter bullishness predicts increases in stock returns, with these then returning to their fundamental values. We believe that our results may support the investor sentiment hypothesis in behavioural finance.

Quantifying The Effects Of Online Bullishness On International Financial Markets - Introduction

According to the Efficient Market Hypothesis (EMH; Fama, 1970) investors operate as rational actors and share prices therefore fully reflect all existing, new, and even hidden information. Traditional efficient market models, however, fail to explain important market anomalies, such as the Great Crash of 1929, the Black Monday crash of October 1987, the dot-com bubble of the late 1990s and the stock market collapse of 2008. Behavioural finance challenges the EMH by emphasising the important role of behavioural and emotional factors in investor behaviour (Kahneman and Tversky, 1979; Shiller, 2006). Behavioural finance is based on two major assumptions: (i) “investor sentiment”, i.e. the actions of investors are also determined by sentiment and not just rational considerations; and (ii) “limits-to-arbitrage”, i.e. betting against irrational investors is costly and risky. Owing to the limited arbitrage of sophisticated investors, investor sentiment can influence stock prices (De Long et al., 1990a). In addition, investor sentiment or perceptions about the market can directly reflect general consumer sentiment about the economy, which can in turn influence consumer spending (Carroll et al., 1994). Knowing timely information on investor sentiment and consumer confidence can help government policy-makers and central banks to anticipate market trends and plan ahead. Therefore, the assessment and measurement of investor sentiment and its effects has become an important research topic (Baker and Wurgler, 2007).

In recent years, researchers have explored a variety of computational methods to measure the investor sentiment indicated by commonly used online data sources, such as stock message boards, news reports, microblogging environments, blogs and search engine queries. This approach holds considerable promise, given the unprecedented scale, high degree of detail, low cost and high frequency of the underlying data.

To the best of our knowledge, existing market sentiment measures are either classifier or dictionary-based. In Antweiler and Frank (2004), two popular classifiers – Naive Bayes and Support Vector Machine (SVM) classifiers – are employed to classify stock messages into three categories: “bullish”, “bearish” and “neutral”. This research has found that message bullishness and volume can help predict market volatility, but is of limited value when it comes to predicting returns. Similar results have been obtained in later work that uses as many as five classifier algorithms (Das and Chen, 2007). The latest and most relevant study (Oh and Sheng, 2011) classifies stock tweets from StockTwits® into the bullish and bearish categories, and builds a bullishness index that is shown to be predictive of future share price movements.

Together with machine learning approaches, a number of studies have focused on the development of linguistic lexicons or dictionaries to determine investor sentiment from the frequency of words in financial data sources. Perhaps the most influential study is that by Tetlock (2007), which uses the frequency of words on the Harvard negative word list (Havard-IV-4-TagNeg) in daily news items to construct a pessimism indicator; one found to predict the daily Dow Jones returns and company share prices reported in the author’s later work (Tetlock et al., 2008). However, Loughran and McDonald (2011) argue that the Harvard Psychosociological Dictionary has been developed for the fields of psychology and sociology: hence, many words that are classified as negative are not negative in a financial context. They developed an alternative negative word list comprised of 2,337 words, which was found to outperform the Harvard dictionary in measuring financial sentiment.

Classifiers and dictionary-based methods are useful for automatically processing large sets of text data used to produce general sentiment indicators. However, the variegated contexts and subtleties of human language pose a tough challenge for human raters and text analysis algorithms. In fact, the low accuracy with which humans themselves can assess text sentiment inevitably sets an unfavourable upper bound on what the best supervised classifiers can achieve. According to some studies (Das and Chen, 2007; Oh and Sheng, 2011; Pang and Lee, 2008), a machine learning classification accuracy of between 60% and 70% is considered to be acceptable. Dictionary-based methods do not require human-defined ground truth or supervision, but dictionary words are usually selected on the basis of ad hoc criteria. Furthermore, word-weighting schemes may be biased and context-sensitive, and dictionaries cannot be adjusted to reflect varying word contexts and semantics.