Identification of a Social Media Equity Factor Derived Directly from Tweet Sentiments

Jim Kyung-Soo Liew
Johns Hopkins University – Carey Business School (JHU)

Tamás Budavári
Johns Hopkins University – Department of Applied Mathematics and Statistics

January 6, 2016

Tweet Sentiments – Abstract:

We document that tweet sentiments explain the time-series variation in return for a sample of equity securities. Succinctly, tweet sentiment matters. We argue that even though these sentiments are company specific, they survive inclusion of widely popular financial factors. Notably, our “Social Media company-specific” factor is very different from these prior “macro” factors. The Social Media Factor consists of views generated from the crowd on a specific security. Some may argue that this company-specific noise should be diversified away and not receive any compensation over time, reducing this factor to irrelevance. We disagree. We show that a simple time-series of this social media factor survives in time-series multiple regressions. Over time, the market has been compensating exposure to changes in this factor. It appears the more popular a given stock is as measured by total daily tweet volume, the more powerful is its social media factor. We document that the social media factor is distinct from traditional factors authored by Fama and French (1992, 1993, and 2014).

Identification of a Social Media Equity Factor Derived Directly from Tweet Sentiments By Jim Liew1 and Tamas Budavari2 Wall Street continues to struggle with the massive amounts of data generated from nontraditional sources. One such source broadly classified as “social media data” 3 provides a wellspring of user-generated content. This content is, however, problematic, as it appears to grow exponentially and suffers massively from a lack of data integrity. That is, there are virtually no internal and/or external quality controls. Unlike mandatory financial disclosures and professional news sources, much of social media data is raw and unorganized. Given both the lack of standardization and oversight the value and impact of such information on financial markets is a pressing concern for market professionals. Some professionals quickly discount such sources as “noise.” However, we are witnessing more and more research work that shows efficacy in this data. In this work, we attempt to shed light on the importance of such sources of information on the financial markets, by showing a clear link between stock tweet sentiments4 and security returns. Additionally, we argue that traditional asset modeled based on market-wide factors need to be adjusted to include this new company-specific social media factor. We document that tweet sentiments explain the time-series variation in return for a sample of securities. Succinctly, tweet sentiment matters. We argue that even though these sentiments are company specific, they survive inclusion of widely popular financial factors. Notably, our “social media company-specific” factor is very different from these prior “macro” factors. The social media factor consists of views generated from the crowd on a specific security. Some may argue that this company-specific noise should be diversified away and not receive any compensation over time, reducing this factor to irrelevance. We disagree. We show that a simple time-series of this social media factor survives in time-series multiple regressions. Over time, the market has been compensating exposure to changes in this factor. It appears the more popular a given stock is as measured by total tweet volume, the more powerful is its social media factor. We document that the social media factor is distinct from traditional factors authored by Fama and French (1992, 1993, 2014). We argue that tweet sentiment represents a security characteristic similar to the the argument made by Daniel and Titman (1996) with regard to their study of book-to-market (B/M). That is, the characteristics are firm specific. Unlike Daniel and Titman (1996) we argue that our characteristic matters in the context of our proposed social media factor model. We surmise that the idiosyncratic risk could be broken further into two components: the social sentiment component and the “noise.” The noise is the standard residual component of empirical asset pricing models. The interesting contribution of our work is identifying explicitly the sentiment component. Our proposed model consists of the Fama-French Five Factor Model and a company specific Social Media Factor. Our study possesses the following weaknesses: (1) a short time period – daily data over prior few years January 2013 to November 2015, and (2) a small number of securities examined – fifteen stocks (e.g. AAPL, FB, NFLX, YHOO, AMZN, MU, BAC, DIS, PCLN, FSLR, MSFT, GOOG, AAL, CHK, and RIG). These were selected based on the high volume of available sentiment data. Additionally, we preformed robustness tests examining 498 total stocks which support our main results. With this study we hope to inspire debate and further research. In this work, we show that tweet sentiments help explain the time-series variation of security returns beyond the variation explained by Fama-French’s Five-Factor Model (2014).5 Their impact is both statistically and economically significant. Moreover, a simple story can be told about their existence in that prices should incorporate all publicly available information both from traditional sources as well as non-traditional sources such as tweets. With innovation from FinTech companies, the mechanism through which information is assembled and delivered has irrevocably changed. If social media platforms give us access to a pool of largely independently generated sentiments, then exposing the component of idiosyncratic risk that comes due to participants’ aggregated sentiment could make markets more efficient. Since sentiment information appears to be transmitted and incorporated quickly and efficiently, it becomes vital to our understanding of security price behavior.

Introduction Currently, many sites exist that allow users to share thoughts and ideas about financial markets: Seeking Alpha, Estimize, StockTwits, Twitter, etc. Crowd-sourced information are typically generated by individual market participants, such as hedge fund traders, individual traders, financial information providers, MBA students, bored retirees, etc. The most well-known of the micro-blogging sites used by such participants are Twitter and StockTwits. Both allow users to author specific tweets and attach them to particular securities. Twitter, however, only grants limited data access non-paying users. StockTwits has granted us access to both their firehose and historical daily data for research purposes. In this work, we investigate the StockTwits data.

 

………………………..

 

Tweet Sentiments

We conclude that the Fama-French Five-Factor Model should be further decomposed into a SixFactor Model, with the sixth factor being our Social Media Factor. Our work is the first to document a link between user-defined tweet sentiment and security prices, as well as a 13 theoretical justification that reconciles the Social Media Factor in the context of Fama andFrench’s Five-Factor mimicking portfolios.

Tweet Sentiments

Full research paper on tweet sentiments below

Tweet Sentiments FULL PDF

Tweet Sentiments
Tweet Sentiments