What Does Correlation Looks Like? Optical Illusion / Optical Truth by W. Ben Hunt, Ph.D. – Salient Partners
Portion of original dot map by Dr. John Snow, the founding father of epidemiology, showing the clusters of cholera cases in the London epidemic of 1854. The visual representation of Snow’s data analysis convinced local authorities to shut down the contaminated public well at ground zero of the cholera outbreak, although it would be another 20 years before Snow’s arguments in favor of germ theory and a direct connection between cholera and fecal contamination of water supply would be widely accepted.
John Snow, “On the Mode of Communication of Cholera” (1855)
Anscombe’s Quartet: four datasets that appear identical using summary statistical methods (mean, variance, correlation, linear regression), but are completely different in meaning and composition – a difference that is clearly revealed through visual inspection.
Frank Anscombe, “Graphs in Statistical Analysis” American Statistician v.27 no.1 (1973), drawing by Schutz
Charles Joseph Minard, “Carte Figurative” of Napoleon’s 1812 Russian Campaign (1869)
The Minard Map: a map of Napoleon’s disastrous invasion of Russia in 1812, showing six distinct data dimensions (troop strength, temperature, distance marched, geographic latitude and longitude, direction of travel, location at event dates) in 2-dimensional form.
Here too it’s masquerade, I find:
Edouard de Reszke as Mephistopheles
|So, so you think you can tell|
Heaven from Hell,
Blue skies from pain.
Can you tell a green field
From a cold steel rail?
A smile from a veil?
Do you think you can tell?– Roger Waters, “Wish You Were Here” (1975)
A great deal of intelligence can be invested in ignorance when the need for illusion is deep.
– Saul Bellow, “To Jerusalem and Back” (1976)
It is difficult to get a man to understand something, when his salary depends on his not understanding it.
Knowledge kills action; action requires the veils of illusion.
To find out if she really loved me, I hooked her up to a lie detector. And just as I suspected, my machine was broken.
Edward Tufte is a personal and professional hero of mine. Professionally, he’s best known for his magisterial work in data visualization and data communication through such classics as The Visual Display of Quantitative Information (1983) and its follow-on volumes, but less well-known is his outstanding academic work in econometrics and statistical analysis. His 1974 book Data Analysis for Politics and Policy remains the single best book I’ve ever read in terms of teaching the power and pitfalls of statistical analysis. If you’re fluent in the language of econometrics (this is not a book for the uninitiated) and now you want to say something meaningful and true using that language, you should read this book (available for $2 in Kindle form on Tufte’s website). Personally, Tufte is a hero to me for escaping the ivory tower, pioneering what we know today as self-publishing, making a lot of money in the process, and becoming an interesting sculptor and artist. That’s my dream. That one day when the Great Central Bank Wars of the 21st century are over, I will be allowed to return, Cincinnatus-like, to my Connecticut farm where I will write short stories and weld monumental sculptures in peace. That and beekeeping.
But until that happy day, I am inspired in my war-fighting efforts by Tufte’s skepticism and truth-seeking. The former is summed up well in an anecdote Tufte found in a medical journal and cites in Data Analysis:
One day when I was a junior medical student, a very important Boston surgeon visited the school and delivered a great treatise on a large number of patients who had undergone successful operations for vascular reconstruction. At the end of the lecture, a young student at the back of the room timidly asked, “Do you have any controls?” Well, the great surgeon drew himself up to his full height, hit the desk, and said, “Do you mean did I not operate on half of the patients?” The hall grew very quiet then. The voice at the back of the room very hesitantly replied, “Yes, that’s what I had in mind.” Then the visitor’s fist really came down as he thundered, “Of course not. That would have doomed half of them to their death.” God, it was quiet then, and one could scarcely hear the small voice ask, “Which half?”
The latter quality — truth-seeking — takes on many forms in Tufte’s work, but most noticeably in his constant admonitions to LOOK at the data for hints and clues on asking the right questions of the data. This is the flip-side of the coin for which Tufte is best known, that good/bad visual representations of data communicate useful/useless answers to questions that we have about the world. Or to put it another way, an information-rich data visualization is not only the most powerful way to communicate our answers as to how the world really works, but it is also the most powerful way to design our questions as to how the world really works. Here’s a quick example of what I mean, using a famous data set known as “Anscombe’s Quartet”.
In this original example (developed by hand by Frank Anscombe in 1973; today there’s an app for generating all the Anscombe sets you could want) Roman numerals I – IV refer to four data sets of 11 (x,y) coordinates, in other words 11 points on a simple 2-dimensional area. If you were comparing these four sets of numbers using traditional statistical methods, you might well think that they were four separate data measurements of exactly the same phenomenon. After all, the mean of x is exactly the same in each set of measurements (9), the mean of y is the same in each set of measurements to two decimal places (7.50), the variance of x is exactly the same in each set (11), the variance of y is the same in each set to two decimal places (4.12), the correlation between x and y is the same in each set to three decimal places (0.816), and if you run a linear regression on each data set you get the same line plotted through the observations (y = 3.00 + 0.500x).
But when you LOOK at these four data sets, they are totally alien to each other, with essentially no similarity in meaning or probable causal mechanism. Of the four, linear regression and our typical summary statistical efforts make sense for only the upper left data set. For the other three, applying our standard toolkit makes absolutely no sense. But we’d never know that — we’d never know how to ask the right questions about our data — if we didn’t eyeball it first.
Okay, you might say, duly noted. From now on we will certainly look at a visual plot of our data before doing things like forcing a line through it and reporting summary statistics like r-squared and standard deviation as if they were trumpets of angels from on high. But how do you “see” multi-variate datasets? It’s one thing to imagine a line through a set of points on a plane, quite another to visualize a plane through a set of points in space, and impossible to imagine a cubic solid through a set of points in hyperspace. And how do you “see” embedded or invisible data dimensions, whether it’s an invisible market dimension like volatility or an invisible measurement dimension like time aggregation or an invisible statistical dimension like the underlying distribution of errors?
The fact is that looking at data is an art, not a science. There’s no single process, no single toolkit for success. It requires years of practice on top of an innate artist’s eye before you have a chance of being good at this, and it’s something that I’ve never seen a non-human intelligence accomplish successfully (I can’t tell you how happy I am to write that sentence). But just because it’s hard, just because it doesn’t come easily or naturally to people and machines alike … well, that doesn’t mean it’s not the most important thing in data-based truth-seeking.
Why is it so important to SEE data relationships? Because we’re human beings. Because we are biologically evolved and culturally trained to process information in this manner. Because — and this is the Tufte-inspired market axiom that I can’t emphasize strongly enough — the only investable ideas are visible ideas. If you can’t physically see it in the data, then it will never move you strongly enough to overcome the pleasant fictions that dominate our workaday lives, what Faust’s Tempter, the demon Mephistopheles, calls the “masquerade” and “the dance of mind.” Our similarity to Faust (who was a really smart guy, a man of Science with a capital S) is not that the Devil may soon pay us a visit and tempt us with all manner of magical wonders, but that we have already succumbed to the blandishments of easy answers and magical thinking. I mean, don’t get me started on Part Two, Act 1 of Goethe’s magnum opus, where the Devil introduces massive quantities of paper money to encourage inflationary pressures under a false promise of recovery in the real economy. No, I’m not making this up. That is the actual, non-allegorical plot of one of the best, smartest books in human history, now almost 200 years old.
So what I’m going to ask of you, dear reader, is to look at some pictures of market data, with the hope that seeing will indeed spark believing. Not as a temptation, but as a talisman against the same. Because when I tell you that the statistical correlation between the US dollar and the price of oil since Janet Yellen and Mario Draghi launched competitive monetary policies in mid-June of 2014 is -0.96 I can hear the yawns. I can also hear my own brain start to pose negative questions, because I’ve experienced way too many instances of statistical “evidence” that, like the Anscombe data sets, proved to be misleading at best. But when I show you what that correlation looks like …
© Bloomberg Finance L.P., for illustrative purposes only
I can hear you lean forward in your seat. I can hear my own brain start to whir with positive questions and ideas about how to explore this data further. This is what a -96% correlation looks like.
What you’re looking at in the green line is the Fed’s favored measure of what the US dollar buys around the world. It’s an index where the components are the exchange rates of all the US trading partners (hence a “broad dollar” index) and where the individual components are proportionally magnified/minimized by the size of that trading relationship (hence a “trade-weighted” index). That index is measured by the left hand vertical axis, starting with a value of about 102 on June 18, 2014 when Janet Yellen announced a tightening bias for US monetary policy and a renewed focus on the full employment half of the Fed’s dual mandate, peaking in late January and declining to a current value of about 119 as first Japan and Europe called off the negative rate dogs (making their currencies go up against the dollar) and then Yellen completely back-tracked on raising rates this year (making the dollar go down against all currencies). Monetary policy divergence with a hawkish Fed and a dovish rest-of-world makes the dollar go up. Monetary policy convergence with everyone a dove makes the dollar go down.
What you’re looking at in the magenta line is the upside-down price of West Texas Intermediate crude oil over the same time span, as measured by the right hand vertical axis. So on June 18, 2014 the spot price of WTI crude oil was over $100/barrel. That bottomed in the high $20s just as the trade-weighted broad dollar index peaked this year, and it’s been roaring back higher (lower in the inverse depiction) ever since. Now correlation may not imply causation, but as Ed Tufte is fond of saying, it’s a mighty big hint. I can SEE the consistent relationship between change in the dollar and change in oil prices, and that makes for a coherent, believable story about a causal relationship between monetary policy and oil prices.
What is that causal narrative? It’s not just the mechanistic aspects of pricing, such that the inherent exchange value of things priced in dollars — whether it’s a barrel of oil or a Caterpillar earthmover — must by definition go down as the exchange value of the dollar itself goes up. More impactful, I think, is that for the past seven years investors have been well and truly trained to see every market outcome as the result of central bank policy, a training program administered by central bankers who now routinely and intentionally use forward guidance and placebo words to act on “the dance of mind” in classic Mephistophelean fashion. In effect, the causal relationship between monetary policy and oil prices is a self-fulfilling prophecy (or in the jargon du jour, a self-reinforcing behavioral equilibrium), a meta-example of what George Soros calls reflexivity and what a game theorist calls the Common Knowledge Game.
The causal relationship of the dollar, i.e. monetary policy, to the price of oil is a reflection of the Narrative of Central Bank Omnipotence, nothing more and nothing less. And today that narrative is everything.
Here’s something smart that I read about this relationship between oil prices and monetary policy back in November 2014 when oil was north of $70/barrel:
I think that this monetary policy divergence is a very significant risk to markets, as there’s no direct martingale on how far monetary policy can diverge and how strong the dollar can get. As a result I think there’s a non-trivial chance that the price of oil could have a $30 or $40 handle at some point over the next 6 months, even though the global growth and supply/demand models would say that’s impossible. But I also think the likely duration of that heavily depressed price is pretty short. Why? Because the Fed and China will not take this lying down. They will respond to the stronger dollar and stronger yuan (China’s currency is effectively tied to the dollar) and they will prevail, which will push oil prices back close to what global growth says the price should be. The danger, of course, is that if they wait too long to respond (and they usually do), then the response will itself be highly damaging to global growth and market confidence and we’ll bounce back, but only after a near-recession in the US or a near-hard landing in China.
Oh wait, I wrote that. Good stuff.
But that was a voice in the wilderness in 2014, as the dominant narrative for the causal factors driving oil pricing was all OPEC all the time. So what about that, Ben? What about the steel cage death match within OPEC between Saudi Arabia and Iran and outside of OPEC between Saudi Arabia and US frackers? What about supply and demand? Where is that in your price chart of oil? Sorry, but I don’t see it in the data. Doesn’t mean it’s not really there. Doesn’t mean it’s not a statistically significant data relationship. What it means is that the relationship between oil supply and oil prices in a policy-controlled market is not an investable relationship. I’m sure it used to be, which is why so many people believe that it’s so important to follow and fret over. But today it’s an essentially useless exercise in data analytics. Not wrong, but useless … there’s a difference!
Of course, crude oil isn’t the only place where fundamental supply and demand factors are invisible in the data and hence essentially useless as an investable attribute. Here’s the dollar and something near and dear to the hearts of anyone in Houston, the Alerian MLP index, with an astounding -94% correlation:
© Bloomberg Finance L.P., for illustrative purposes only
Interestingly, the correlation between the Alerian MLP index and oil is noticeably less at -88%. Hard to believe that MLP investors should be paying more attention to Bank of Japan press conferences than to gas field depletion schedules, but I gotta call ‘em like I see ‘em.
And here’s the dollar and EEM, the dominant emerging market ETF, with a -89% correlation:
© Bloomberg Finance L.P., for illustrative purposes only
There’s only one question that matters about Emerging Markets as an asset class, and it’s the subject of one of my first (and most popular) Epsilon Theory notes, “It Was Barzini All Along”: are Emerging Market growth rates a function of something (anything!) particular to Emerging Markets, or are they simply a derivative function of Developed Market central bank liquidity measures and monetary policy? Certainly this chart suggests a rather definitive answer to that question!
And finally, here’s the dollar and the US Manufacturing PMI survey of real-world corporate purchasing managers, probably the most respected measure of US manufacturing sector health. This data relationship clocks in at a -92% correlation. I mean … this is nuts.
© Bloomberg Finance L.P., for illustrative purposes only
Here’s what I wrote last summer about the inexorable spread of monetary policy contagion.
Monetary policy divergence manifests itself first in currencies, because currencies aren’t an asset class at all, but a political construction that represents and symbolizes monetary policy. Then the divergence manifests itself in those asset classes, like commodities, that have no internal dynamics or cash flows and are thus only slightly removed in their construction and meaning from however they’re priced in this currency or that. From there the divergence spreads like a cancer (or like a cure for cancer, depending on your perspective) into commodity-sensitive real-world companies and national economies. Eventually – and this is the Big Point – the divergence spreads into everything, everywhere.
I think this is still the only story that matters for markets.
The good Lord giveth and the good Lord taketh away. Right now the good Lord’s name is Janet Yellen, and she’s in a giving mood. It won’t last. It never does. But it does give us time to prepare our portfolios for a return to competitive monetary policy actions, and it gives us insight into what to look for as catalysts for that taketh away part of the equation.
Most importantly, though, I hope that this exercise in truth-seeking inoculates you from the Big Narrative Lie coming soon to a status quo media megaphone near you, that this resurgence in risk assets is caused by a resurgence in fundamental real-world economic factors. I know you want to believe this is true. I do, too! It’s unpleasant personally and bad for business in 2016 to accept the reality that we are mired in a policy-controlled market, just as it was unpleasant personally and bad for business in 1854 to accept the reality that cholera is transmitted through fecal contamination of drinking water. But when you SEE John Snow’s dot map of death you can’t ignore the Broad Street water pump smack-dab in the middle of disease outcomes. When you SEE a Bloomberg correlation map of prices you can’t ignore the trade-weighted broad dollar index smack-dab in the middle of market outcomes. Or at least you can’t ignore it completely. It took another 20 years and a lot more cholera deaths before Snow’s ideas were widely accepted. It took the development of a new intellectual foundation: germ theory. I figure it will take another 20 years and the further development of game theory before we get widespread acceptance of the ideas I’m talking about in Epsilon Theory. That’s okay. The bees can wait.