Risk Premium Of Social Media Sentiment

Patrick Houlihan
Stevens Institute of Technology

Germán G. Creamer
Stevens Institute of Technology – Wesley J. Howe School of Technology Management

May 1, 2016


This research investigates the predictive capability of sentiment extrapolated from three dictionaries; financial, social media and mood states. Our findings show: 1) through the Fama-Macbeth regression method, social media based sentiment measures can be used as risk factors in an asset pricing framework; 2) these sentiment measures have predictive capability when used as features in a machine learning framework, and 3) adjusting returns for market effects result in positive alpha.

Risk Premium Of Social Media Sentiment – Introduction

The human brain is composed of billions of sensory neurons which continuously gather information through our five senses and process and deliver that information across synapses to other neurons. Ultimately, some of this data ends up being processed into a decision to perform an action or behavior, like blinking your eye or formulating a response to a question. People are interconnected with one another much like the neurons in the human brain, and never more prevalent than in recent history with the adoption of the internet with over 3 billion worldwide users. Many actions take place online from posting tweets to executing trades in the stock market. The vast majorities of these online actions are stored and estimates show internet traffic will top 86,000 petabytes/month by 2018 (Vlachos [2015]). This data can be mined to measure author sentiment through analysis of text (Bollen et al. [2009]; Hu et al. [2004]) or sifting through market data to extract trading patterns of a market participant. From a crowd-source perspective, other examples include aggregating together millions of blogs and extracting sentiment (Pang et al. [2008]) of an entire community or measuring option trading imbalances between call and put options (Cao et al. [2003]). This research focuses on the former, extracting various sentiment from blogs using different dictionaries.

This paper evaluates if sentiment extracted from social media, a microblogging website, can be used as risk factors in an asset pricing framework. In addition, we explore if this sentiment captures investor behavior and harnessed in a predictive analytics framework to realize abnormal gains adjusted for market effects.

Social Media Sentiment 1

Social Media Sentiment

Social Media Sentiment

Social media websites like Twitter have gained mass popularity and serve as a medium for communicating through a few sentences. The nature of microblogs such as Twitter posts or tweets is that they go direct to the point on topics and are less verbose (140-character limit). These characteristics enable them to be a prime candidate to extract sentiment for use in predictive analytics (Bermingham et al. [2010]).

Blogs and other on-line chatting mediums are predecessors to ‘real-world’ behavior. The sheer volumes of postings related to various products on Amazon’s website are highly correlated with actual purchase decisions (Gruhl et al. [2005]). Gruhl et al. research was one of the first studies validating the power of harnessing social media to predict consumer behavior. Also, Google query search volume was shown to be a strong predictor of future economic activity in various industries (Choi et al. [2012]), further reinforcing the internet as a viable source to tap for predictive analytics. These studies reveal how the patterns of online consumers affect economic activity. For instance, movie ticket sales are highly correlated with the volume of tweets relating to a movie before release (Asur et al. [2010]).

Through the use of natural language processing and machine learning algorithms, the sentiment expressed as binary values (e.g. good/bad or positive/negative) can be extracted from different documents (Pang et al. [2002]). Given today’s computing power it is possible to aggregate the sentiment of a large number of microblogging posts together to compute a crowd-sourced based measure that captures the overall sentiment of online communities. In addition, more recent research (Cambria et al. [2014]; Poria et al. [2014], and Cambria et al. [2013]) has shown a shift in natural language processing techniques from lexical (single word meanings) to compositional semantics (sentence meanings) which has led to an increase of predicting accuracy.

Social Media Sentiment

Social Media Sentiment

See full PDF below.