Big Data & Investment Management: The Potential To Quantify Traditionally Qualitative Factors by Citi Business Advisory Services
Big data is a catchphrase for a new way of conducting analysis. Big data principles are being adopted across many industries and in many varieties. However, adoption so far by investment managers has been limited. This may be creating a window of opportunity in the industry.
Investment managers who are able to harness this new approach could potentially create an information edge in their trading that would put them significantly ahead of their peers and allow them to profit from an “information arbitrage” between their expanded models and those of investment managers following more traditional analytic techniques.
Big data technology increases 1) the volume of data that can be incorporated into investment models and 2) the velocity at which that data can be processed.
- Big data is based on the blueprint laid out by Google in 2003 around the technology and architecture it developed to handle the massive amounts of data it had to process, store and analyze from retained searches and other applications.
- Big data technologies rely on file-based databases in addition to traditional relational databases. As such, they can store not only structured data, but unstructured data as well. This means that new data sets can be added to a quantitative or systematic model more easily, without a lengthy cleansing, normalization, mapping and upload process.
- Big data technologies also rely on clusters of distributed commodity hardware for processing and on techniques that bounce inquiries from cluster to cluster to utilize any free capacity within the system. This is different than providing point-to-point inquiries to a dedicated server. Because of its distributed nature, big data technologies can process large sets of data at high speeds that facilitate the exploration and testing of new investment hypotheses.
The third facet of the big data phenomenon relates to the variety of data that can now be accessed and the fact that many of these data sets did not exist a few years ago. Parallel to advances in the storage and processing of data has been the development of new types of data being created, primarily due to the growth of the Internet, the advance of social media and the emergence of a new Internet of Things that provides sensory readouts on a huge variety of physical subjects. As these new content sources have developed, there has been a surge in “datafication”, which is defined as “the ability to render into data many aspects of the world that have never been quantified before.”
With the improved volume, velocity and variety of data inherent in the big data approach, the innovation seen in systematic trading models over the past decade could accelerate. Similarly, a wave of innovation could begin in the quantitative investment space as the differences between what used to represent quantitative versus qualitative research disappear.
- Quantitative fundamental investment researchers could employ big data techniques to expand the number of variables they examine to include data around behaviors, opinions and sensory feedback, areas that were previously only the domain of discretionary fundamental researchers. This could allow for a broader model-based view of what constitutes relative, analogous, superior and inferior value using a new set of data points that are not being incorporated into more traditional investment models. This has the potential to create an information arbitrage between firms that are leveraging big data principles and firms that are not.
- Systematic trading models could process new data inputs at the same volume and speed that current programs use in reviewing price, order and transaction data. Rather than simply selecting trades based on analysis across these traditional data sets, new programs may begin to look for correlations across many inputs and thus prove able to identify new trading patterns that link price activity to non-price related variables. “Multi-factor” systematic programs using this broader data set could realize an information edge that today’s multi-system, multi-term and multi-tier systems cannot equal.
- New modeling capabilities linked to the big data approach, such as predictive analytics and machine learning, could change the nature of investment research by creating models that “think” and are able to draw forward-looking conclusions. This could lead to a convergence of quantitative fundamental models that focus on value with systematic trading programs that focus on price. The result could be a new type of automated portfolio management that focuses on “future value” and acts on “likely” events that may not have yet occurred or been announced.
The firms surveyed for this report caution that for most investment managers these changes in approach are still highly aspirational and there are still several obstacles limiting big data adoption.
- Currently the spectrum of Big Data adoption is broad. Early adopters are investing heavily in developing a whole technology stack and hiring data scientists to support investment research. Another segment of funds is experimenting with big data by either enlisting big data techniques that extend their existing research capabilities through proofs of concept or by piloting content from third-party providers utilizing big data technology and new data sets. However, based on our survey of investment managers and service providers, most investment firms are not yet focused on big data because they lack the institutional momentum, the skill set and the business case to build out these capabilities in the short-term.
- Experimentation and usage of big data technology is being driven by the front office and not IT. Investment firms have been seeking ways to tie big data to alpha generation. In most instances, this effort begins organically. A specific research analyst may put in a request to analyze a new data set to understand its relationship to time-series data leading IT to accommodate the request tactically. Currently, this is not enough to drive wholesale change, but it is beginning to move investment managers into big data adoption. Based on feedback from survey participants, we believe that in 2015 pockets of this type of data analysis will drive a broader array of funds towards a more mature and holistic approach in supporting big data capabilities, similar to current early adopters.
- Pressure to experiment with and incorporate big data principles into investment research will build because early adopters are already locking up access to semi-private data sets to provide their models with an information edge. Early adopters are also already using this obscure data in tandem with more readily available data sets, such as social media, government and consumer transaction data. This competency may yield information arbitrage, giving firms that employ big data techniques an advantage over late adopters for some time until these techniques are utilized by more organizations.
- Efforts to accelerate the adoption of big data principles are being facilitated by a marketplace of third-party providers and data vendors. This allows a broader swath of investment managers to acquire some basic big data capabilities without full-scale infrastructure and staff investments. Even investment managers who do not acquire teams of data scientists and specialized technology staff will still be able to participate in the evolution of big data in other ways via options discussed later in Section IV.
Other firms surveyed with robust big data programs report that results are not easily obtained.
- Gaining advantage from big data requires the right set of questions, experimentation and time for patterns to emerge. Funds that have already experimented with unique data sets have also experienced some failure in obtaining investible insights. Some data sets are not necessarily the right ones for the questions posed due to cultural, regional or other yet-to-be-understood nuances. Funds are spending long periods of time beta testing data as it often takes time for patterns to emerge and because some of the data being investigated is so new that the nature of the data is changing over time. Those that are successful are also not necessarily disclosing where they have found an advantage.
- There are many integration and cultural challenges that must be understood in bringing new skill sets into the investment management arena. Many of the new resources coming to investment managers to spur their big data program derive from Internet firms, gaming companies, the military and organizations focused on interpreting and taking advantage of consumer behavior. These resources as well as existing investment management researchers need training to work effectively together.
When it works, however, the benefits of big data are not only being seen in the front office. Key functional areas within funds, such as compliance, are reportedly beginning to rely heavily on big data for emerging use cases such as eDiscovery for trade surveillance or on utilizing outside firms to help standardize compliant uses of social media. Risk teams are looking at running more robust scenario analysis. Marketing teams are looking to examine investor and distribution information to better target capital-raising efforts. With time, investment managers and external service providers may well identify a growing set of non-investment uses for big data that could reduce costs and provide greater operational insight into investment management organizations.
Introduction and Methodology
The ability to mine insights from new types of data in large, complex, unstructured data sets and use new technologies and non-traditional skill sets to probe investment theses is enhancing the way that investment managers perform research. More broadly, the potential presented by big data is also allowing for a new approach to risk management, compliance and other critical investment support functions.
Big data is a larger construct that has been made possible by a convergence of social trends, new data sources, technologies, modes of distribution and merging disciplines. It has yielded a new class of data sets, technologies and an industry of vendors to support it, providing multiple ways for managers of diverse strategies and sizes to take advantage of big data.
To understand the current state of play for investment managers regarding big data, Citi Business Advisory Services partnered with First Derivatives to conduct a survey of industry participants. These interviews were qualitative in nature, focusing on existing use cases, trends, expectations and predictions about where big data principles could take the industry.
Interviews were conducted across a set of investment managers as well as with leading vendors and service providers in the big data space who shared insights about their financial services clients. Where appropriate, some relevant quotes have been included from these interviews to allow readers to experience the statements that helped formulate the views presented in this paper.
An extensive amount of research was also performed, and information and views from many leading thinkers in the big data space as well as from experts on blending teams and creating innovative work environments were utilized. There are numerous footnotes scattered throughout the report and readers are encouraged to refer to these articles and books as a guide if they are interested in more details about any of these topics.
The structure of the paper is as follows:
- Section I provides an overview and explanation of what constitutes big data and how it is different from other analytic frameworks available in earlier times.
- Section II looks at the new data sets emerging as a result of the “datafication” process and highlights vendors who can be accessed to receive such data.
- Section III examines how systematic trading programs and quantitative fundamental programs have evolved and contrasts the current state of the industry with how these investment processes could change as a result of big data principles. These findings are extrapolated to present a possible future where these two types of investment analysis could converge and create a whole new approach of automated portfolio management.
- Section IV presents a maturity model that shows how investment management firms can begin their journey toward incorporating big data principles and illustrates how capabilities and management buy-in change as a firm becomes more advanced.
Each of these sections is written with the average front office user in mind. As such, there are only high-level explanations and mentions of the technologies that make up the big data paradigm. The appendix to this paper goes into a much deeper level of detail and is geared toward the IT team within an investment management organization.
Section I: Understanding “Big Data” – New Thresholds of Volume, Velocity and Variety
Big data is a concept that encompasses new technologies and also relies on the melding of new skill sets, analysis techniques and data sets that have only become available in recent years.
To understand how revolutionary the big data concept is, it is instructive to start by contrasting the new approach with what traditional analysis looked like in the pre-big data era.
Traditional Quantitative Approaches to Data Analysis
Traditionally, data analysis has implied a quantitative approach, whereby highly structured data is processed in a spreadsheet or a database. Chart 1 shows that originally this set of data was solely composed of numbers, but as digitalization began in the computer age, teams have been able to take documents and images and convert those data sources into inputs that could also be captured for use in quantitative analysis.
Capturing these new digitized data sets and combining numerical data inputs from several sources typically required substantial normalization and mapping of data fields so that the information could be uniformly accessed. For the past 40 years, relational database management systems (RDBMS) have been processing and storing structured data in both commercial (Oracle, Sybase, MS SQL Server, DB2, Informix) and open source (MySQL) databases.
These databases are characterized as tabular, highly dependent on pre-defined data definitions, and query based (“structured query language” or “SQL”). They require a great deal of upfront work and planning in scaled applications. Even with the emergence of sophisticated standards, adapters, translation mechanisms and indexing strategies for media, they just cannot address all current data needs. Going from proof of concept to full-scale enterprise production applications that use relational databases requires normalization of data streams, which is time-consuming and resource-intensive.
The speed at which data can be evaluated in RDBMS is also highly dependent on the scale of the enterprise’s infrastructure; inquiries are processed only as quickly as the speed of the organization’s servers will allow. While many groups seek to extend that processing ability by outsourcing their backbone to a managed services provider, the speed at which information can be processed is still limited by the size of the organization’s overall platform.
Moreover, it has been nearly impossible for these databases to capture the qualitative inputs that investment teams often use to supplement their quantitative analysis approach. Such qualitative information could include opinions on a company’s management team; consumer views of the company’s brand; or how the company behaved in pursuing its business goals, such as the cleanliness of their production facilities, the level of controls they had around their supply chain, or how they responded to unexpected manufacturing issues, mechanical breakdowns and employee situations.
See full report here.